There are two problems with humans making decisions from data. We are biased— even experts are just as likely to give inconsistent judgments—and we don’t always understand, or trust, the model. Although decision-makers could benefit from using data as a part of their decision making, raw machine learning results may not be meaningful enough. So how can we use data in a way that experts trust without diluting the machine learning process?
Sort by Topic
June 16, 2017
It’s like an irritating fly buzzing around your head – “Big Data”. Do I have to hear the term one more time? As a Data Scientist who interacts with hard data problems on a daily basis, the term, “Big Data”, has little meaning to me since it lacks any precise definition or structure, the normal comfort food for my mathematical mind. However, I appreciate that the term is “out there” and people who are trying to make sense of the mess of data flowing towards them are searching for a common language to discuss their specific challenges. I hope to help by explaining some of the key language circulating around Big Data concepts – the next level of detail in the Big Data discussion.
June 9, 2017
After a first pass of training and evaluating a model, you may find you need to improve its results. Here is a checklist, adapted from Chapter 13 of the Handbook of Statistical Analysis and Data Mining Applications, of ten practical actions that I’ve found usually help:
June 2, 2017
Software usage logs are a valuable data source that can reveal insights into customer experience, user configurations, workflows, and software stability. Parsing meaningful attributes from log data is a critical step to understanding user behavior and driving product improvements. However, log data comes in many formats, some of which are far from standardized. Here, we review nonstandard log formats and how log analytics can extract meaningful product usage information.
For predictive analytics to work, two different species must cooperate in harmony: the business leader and the quant. In order to function together, they each have to adapt. On the one hand, the quant needs to attain a business-oriented vantage. And on the other, the business leader must navigate a very alien world indeed. Deal and Pilcher’s new book, “Mining Your Own Business,” helps with that second bit.
When conversation in organizations turn to analytics, topics normally include the quality and accessibility of data, the infrastructure for storing and processing data, the necessary level of analytics sophistication, and the unique skillsets required to build a successful analytics program. Often overlooked in is the importance of having an enterprise level analytics governance strategy.
It will soon be rockfish season again, and even though most of us are better at talking about the fish we caught than we are at catching them, let’s take a moment to discuss a vital data science practice — avoiding reinforcement bias — that will help us improve our catch out on the water.
May 5, 2017
There is no doubt that a successful Data Scientist must be proficient in programming, modeling, and data munging (extracting, cleaning, and feature engineering data). However, there is another key skill that is often overlooked: the ability to communicate findings clearly and effectively. If you as a Data Scientist cannot motivate the business buy-in to effect change, your powerful model will collect dust on a shelf. Stakeholders will only trust your model if they understand the value it adds, what has been done to create it, and why it works. They should not be left to trust you and your “black box” blindly. The solution is data storytelling: using the power of narrative to communicate your findings in a way that resonates with your stakeholders. Doing this combines your data science expertise with intuitive visualizations and—most importantly—a story to connect the dots.
April 28, 2017
The key question that our clients ALWAYS ask is “Can we guarantee value from the analytics project?” They want to know whether the return on their investment will be positive. But they want this guarantee at the proposal stage, before we have even seen the data, the technical infrastructure, or other relevant details about the analytics environment. Though a prior guarantee is impossible, there are key factors that one can assess early to estimate project viability and value – to get at the expected cost and return of the proposed investment.
April 21, 2017
Data models can distill powerful insight from raw data. Yet, this insight is only valuable when it is acted on, and it’s only acted on if it’s understood. Visualization plays a vital role in revealing key aspects of data within its overall context, enabling understanding for technical and non-technical decision makers alike.