In his Top 10 Data Mining Mistakes John Elder shares lessons learned from more than 20 years of data science consulting experience. Avoiding these mistakes are cornerstones to any successful analytics project. In this blog about Mistake #2 you will learn about the dangers of relying on a single technique and some of the benefits of employing a handful of good tools.
Sort by Topic
Several scientific disciplines have been rocked by a crisis of reproducibility in recent years . Not long ago, Bayer researchers found that they were only able to replicate 25% of the important pharmaceutical papers they examined , and an MIT report on Machine Learning papers found similar results. Some fields have begun to emerge from their crises, but other fields, such as psychology, may have not yet hit bottom  .
We might imagine that this is because many scientists are good at science but not so adept with statistics. We might even imagine that we Analytics practitioners should have fewer problems because we are good at statistics. As a matter of fact, we find ourselves with an equivalent issue: predictive models that underperform once deployed. We have a powerful tool to prevent an underperforming model in Cross Validation (CV), but the ubiquity of CV in our modeling tools has led many Analysts to misunderstand how to properly use CV or appropriately create CV partitions, leading to lower-performing models.
This article will address the proper use and partitioning of CV to help us avoid these crises of under-performance in our own projects.
September 29, 2017
A recent client of ours found themselves in a sticky situation. They are a long-established manufacturer, with one of the largest market capitalizations in their industry. For decades, they have been a trusted vendor to their customers, in a field where reliability and quality have been paramount.
But lately, their environment was changing. “Disruptors” were entering their market who could sell competing products at a lower price, with lower quality and less functionality. These disruptors seemed to have better command of digital technologies and were adept at using data. Our client had plenty of data, but was only just starting to think strategically about how this data could be used to drive future growth and market opportunity. As profits started to slip, and sales shifted toward the cheaper, newer entrants, it became imperative for the client to get a handle on this digital disruption.
September 22, 2017
“It depends!” That was the phrase all of my former students hated, but knew was coming when they asked if they had a good model. I knew they wanted a “yes” or “no” answer, but I need more information if I am to adequately answer that question. I find that people like general rules that they can apply to problems to get simple answers. However, in data science you need to have perspective on the whole situation before deciding whether a model is good or bad. That is why you need a baseline with which to compare your performance!
September 15, 2017
This is the second in a series of blogs where Data Scientists Anna Godwin and Cory Everington discuss five Analytics Best Practices that are key to building a data-driven culture and delivering value from analytics. In this installment Anna discusses the benefits of using Agile Data Science as a framework for managing data science projects.
September 8, 2017
Data analytics has been called the most powerful decision-making tool of the 21st century. Even though it has come of age only within the past twenty years, thousands of businesses, governmental agencies, and nonprofit organizations have already used it to dramatically increase productivity, reduce waste and fraud, enhance quality, improve customer service, boost revenues, optimize strategies, combat crime and terrorism, and solve a host of other tough challenges. Elder Research, CEO, Gerhard Pilcher, and Vice President of Operations, Jeff Deal, coauthored Mining Your Own Business to provide an easy-to-read overview of data mining and predictive analytics for organizational leaders who want to know more about these powerful tools and develop an analytic capability in their organization.
This blog, drawing from chapter 3 of this book, reviews the three most important keys to leading a successful data science initiative.
September 1, 2017
Often when people think of data scientists, they imagine a mythical person who knows how to do everything required for success: write sophisticated Python libraries, derive cutting edge machine learning algorithms using a deep understanding of statistics, shepherd successful models through deployment, administrate the database, create elegant visual dashboards, deeply understand the business, and drive corporate strategy. The required skills listed on many job postings for Data Scientists is long, overwhelming, and in many cases, completely out of reach for a single person.
August 25, 2017
In his Top 10 Data Mining Mistakes John Elder shares lessons learned from more than 20 years of data science consulting experience. Avoiding these mistakes are cornerstones to any successful analytics project. In this blog about Mistake #5 you will learn how easy it is to accept leaks from the future in your modeling results and the importance of scrutinizing any input variable that works too well.
August 18, 2017
This is the first in a series of blogs where Data Scientists Cory Everington and Anna Godwin discuss five Analytics Best Practices that are key to building a data-driven culture and delivering value from analytics. In this installment Cory discusses the benefits of having a shared framework of Analytics Best Practices to allow you to focus on what's most important—the results.
August 11, 2017
In his Top 10 Data Mining Mistakes John Elder shares lessons learned from more than 20 years of data science consulting experience. Avoiding these mistakes are cornerstones to any successful analytics project. In this blog about Mistake #4 you will learn that inducing models from data has the virtue of looking at the data afresh, not constrained by old hypotheses. But, while “letting the data speak”, you must be careful not to tune out received wisdom, because often, nothing inside the data will protect one from significant, but wrong, conclusions.