Blog

Statistical & Cognitive Biases in Data Science: What is Bias?

Will Goodrum, Ph.D.

October 6, 2017

BLOG_What-is-bias.jpgThis is the first in a series of short blog posts where we explore common varieties of bias that can beset analytics projects. Bias has serious ramifications for the success of analytics in any organization. Understanding the nature of bias is crucial for understanding the extent of a model’s accuracy. In this first post, we discuss what bias is, why it occurs, and why it matters (a lot). 

What is Bias?

Bias has several definitions, and its common usage is decidedly negative. We typically use it to mean systematic favoritism of a group. Generally speaking, “bias” is derived from the ancient Greek word that describes an oblique line (i.e., a deviation from the horizontal). In Data Science, bias is a deviation from expectation in the data. More fundamentally, bias refers to an error in the data. But, the error is often subtle or goes unnoticed. So, why does bias occur in the first place?

Over the next posts in this series, we will briefly define and describe common statistical and cognitive biases, as listed below:what-is-bias.jpg

  • Selection (or sample) Bias
  • Seasonal Bias
  • Linearity Bias
  • Confirmation Bias
  • Recall Bias
  • Survivor Bias
  • Observer Bias
  • Reinforcement Bias

We will also describe why each of these biases poses unique Data Science challenges.

Why does Bias Occur?

Bias occurs because of sampling and estimation. If we could know everything about all the entities in our data (e.g., customers, insurance claims, software sessions), and could store information on all possible entities, our data would have no bias. Additionally, humans are poor intuitive statisticians and their estimations are often inaccurate[1]. These problems are so pernicious they are commonly found even in carefully constructed, controlled statistical experiments.

But, Data Science is not conducted in carefully controlled conditions; it must work with “found data” -- data collected for a purpose other than modeling. That data is very likely to have biases.

Why does Bias Matter?

Predictive models only “see” the world through the data used for training. In fact, they “know” of no other reality.  When those data are biased, model accuracy and fidelity are compromised. Biased models can limit credibility with important stakeholders. At worst, biased models will actively discriminate against certain groups of people. Being aware of these risks allows a Data Scientist to better eliminate bias.  The resulting higher-quality models improves analytics adoption and enhances value from analytics investment.

In the next installment, we will take a brief look at selection bias, and how your data may (or more likely, may not) represent what you think it does.

Request a consultation to speak with an experienced data analytics consultant.


[1] Kahneman, D. Thinking Fast and Slow. Farrar, Straus, and Giroux, New York, NY (2011), pg. 112.


Related

Read part two of this blog series Picking Favorites: A Brief Introduction to Selection Bias

Download the eBook Top 10 Data Mining Mistakes.

Read the blog Avoid Reinforcement Bias When Fishing in the Same Pond.


About the Author

Will Goodrum, Ph.D. Dr. William Goodrum has nearly a decade of experience in the management and delivery of projects and products that embed Data Science and numerical methods in software. At Elder Research, Dr. Goodrum leads a team of six Data Scientists who deliver custom Data Science training and create advanced analytical solutions and strategy for private sector clients around the globe. Dr. Goodrum has experience consulting across different industries, including logistics, software, and philanthropic development. Additionally, Dr. Goodrum has acted as PI on a NASA Phase II STTR program that implemented validated models of corrosion behavior for gas turbine engine rotors. Prior to Elder Research, Dr. Goodrum worked at a global engineering software firm where he supported customers in the Aerospace & Defense, manufacturing, and automotive industries. Dr. Goodrum’s PhD research estimated lifetime highway maintenance costs for the government of New South Wales, Australia.