This blog is 3rd in a series of short posts where we explore common biases that can impair analytics projects.
What Is Linearity Bias and Why Does it Occur?
Linearity Bias is the assumption that a change in one quantity produces a proportional change in another. Unlike Selection Bias, Linearity Bias is a cognitive bias; it’s produced not through some statistical process, but instead through how we mistakenly perceive the world around us.
For example, imagine the answer to this question:
What is the relationship between fuel efficiency and engine displacement (i.e., engine size) of automobiles?
Naturally, as engines become larger, their fuel efficiency decreases. Those of us who have driven large trucks, muscle cars, and tractors know this relationship all too well (and have paid the price at the gas pump).
Most people assume that this downward trend between fuel efficiency and engine displacement is a straight line. It is, at first, but real data tells a more complex story for the full picture, as shown in Figure 1:
Figure 1. Highway fuel efficiency versus engine displacement (from R for Data Science)
Putting all vehicle classes together, the trend is nearly linear for engine sizes under 4.5 liters, but then highway fuel mileage plateaus below 20 miles per gallon (MPG), and even reverses (though the data for the largest engines is sparse). So the relationship between fuel efficiency and engine size is (unexpectedly) nonlinear. Part of the reason behind this is that very different vehicles have larger displacement engines. For example, both high-horsepower, heavy-duty pickup trucks and lightweight, high-performance supercars tend to have engines over 4.5L but are used in completely different ways. When we assume the relationship between fuel economy and displacement is linear, we neglect these important details—and are surprised because of our inherent linearity bias.
Why Does Linearity Bias Matter?
Many real-world business processes are highly nonlinear. For example, time series (such as sales or demand over time) are usually cyclical. This nonlinearity can have serious consequences or tangible benefits, depending upon the context:
- Nonlinearity impacts fundamental business decisions: For example, Langhe, Puntoni, and Larrick showed that some very common business decisions are often strongly nonlinear. Consider their example on price discounting:
“. . . three main factors affect profits: costs, volume, and price. A change in one often requires action on the others to maintain profits. For example, rising costs must be offset by an increase in either price or volume. And if you cut price, lower costs or higher volumes are needed to prevent profits from dipping. . .”
In other words, say a company decides to reduce the price of a product by 15% instead of 10%. Because of the interaction between costs, volume, and price, they must sell more than 5% more units to make up for the deferred profit introduced by the discount. This nonlinear relationship must be taken into account when pricing is set. Other common nonlinear business calculations include interest rate calculations and cost-benefit analyses. Assuming linearity for any of these can lead to very inaccurate estimates of the potential value (or risk) of those decisions.
- Nonlinearity is surprising (and can be used advantageously): Linearity bias is a primary reason that stakeholders are often surprised by actual data. Even experienced experts in a field may assume a business process is linear (like the units sold versus discount rate) when in fact it is not. So getting it right can provide an advantage.
Recently, Elder Research created models that prioritized the client’s outreach to potential donors. The plot in Figure 2 compares several predictive models with two existing process baselines, in terms of the rate of positive response versus the percent of people contacted. The diagonal line shows the expected response rate if donors were contacted at random (i.e., if we contact 10% of all available prospects we would expect 10% to respond positively). Note that this is the only straight line on the figure!
Figure 2. Example of nonlinear trend in model and baseline process performance for a non-profit outreach program.
The client’s baseline processes and our new predictive models all improved the efficiency of outreach compared to a random approach (i.e., we reach more than 10% of the positive responders in the first 10% of people contacted). By using our models, our client would reach about 40% of all positive responders in the first 10% of people contacted. This is a weaker version of the “80/20 rule” in action: here, approximately 20% of donors bring in about 60% of all donations. Although the gains begin to diminish as a greater percentage are contacted, our client was very pleasantly surprised to learn that using Machine Learning would double the improvement over their current processes! (The top model lines are twice as high as the top baseline lines.) This is a classic example of how analytics can improve efficiency through workload prioritization.
How Can Linearity Bias Be Mitigated?
Given the powerful force that nonlinearity presents, how can linearity bias be mitigated (when the risks are nonlinear) or leveraged (when the opportunities are nonlinear)?
- Plot it: the most straightforward (and powerful) method of exposing the effects of linearity bias is to make a graph. The visual results may surprise those whose intuition says the quantities should be linear, but it is hard to argue with; the graph is evidence of the real, underlying relationship found in the data. If you are struggling to communicate the benefits from a Machine Learning algorithm, a well-designed visualization can show a nonlinear increase of performance to your stakeholders. The design of the visualization is key, since outliers and other graphing artifacts may give the appearance of linearity (or nonlinearity) when no such pattern exists in the underlying data.
Note the good example in Figure 1 – how the original data points are shown, as well as a smooth estimate of the mean. Figure 3 below is the same data as in Figure 1, but broken down by vehicle class. When grouped in this fashion, the relationship between fuel economy and engine displacement is much more linear.
Figure 3. Highway fuel economy versus engine displacement segmented by vehicle class. Note that most vehicle classes do have a linear trend when grouped in this way. (Image credit)
- Describe the actual trend: Once you have plotted the data, the graph will enable you to assess any trends. Does the trend begin to fall off, as was the case for positive responses to donor outreach? Or does it accelerate, as for the discount-revenue trade off? Understanding the trend will help decision-makers make informed decisions on tradeoffs based on the data rather than on flawed intuition.
- Clearly communicate the business impact of the nonlinearity: While the monetary impact of discount and deferred revenue is easily calculated, the broader strategic impacts of decisions based on linear assumptions should be assessed. Does the short-term benefit of a planned discount on a product line still make sense, or will it defer too much revenue in the long-term? Is the investment in Machine Learning worth the effort if we’re unsure about how many additional donors we may be able to reach?
Linearity is an assumption of convenience, and it is deceptively familiar. Strategic decisions regarding trade-offs that are made based on intuition (and therefore incorporate linearity bias) will feel natural and comfortable to experts and non-experts alike. Those decisions may also be surprisingly wrong.
Download the eBook Top 10 Data Science Mistakes to learn about other mistakes to avoid in your analytics projects.
 “Linear Thinking in a Nonlinear World” from the May-June 2017 issue of Harvard Business Review (https://hbr.org/2017/05/linear-thinking-in-a-nonlinear-world).
Read part one of this blog series Statistical & Cognitive Biases in Data Science: What is Bias?
Read the blog Avoid Reinforcement Bias When Fishing in the Same Pond.