Several scientific disciplines have been rocked by a crisis of reproducibility in recent years . Not long ago, Bayer researchers found that they were only able to replicate 25% of the important pharmaceutical papers they examined , and an MIT report on Machine Learning papers found similar results. Some fields have begun to emerge from their crises, but other fields, such as psychology, may have not yet hit bottom  .
We might imagine that this is because many scientists are good at science but not so adept with statistics. We might even imagine that we Analytics practitioners should have fewer problems because we are good at statistics. As a matter of fact, we find ourselves with an equivalent issue: predictive models that underperform once deployed. We have a powerful tool to prevent an underperforming model in Cross Validation (CV), but the ubiquity of CV in our modeling tools has led many Analysts to misunderstand how to properly use CV or appropriately create CV partitions, leading to lower-performing models.
This article will address the proper use and partitioning of CV to help us avoid these crises of under-performance in our own projects.