Avoid Reinforcement Bias When Fishing in the Same Pond

Nathan French

May 12, 2017


It will soon be rockfish season again, and even though most of us are better at talking about the fish we caught than we are at catching them, let’s take a moment to discuss a vital data science practice — avoiding reinforcement bias — that will help us improve our catch out on the water.

Imagine that a group of marine biologists asked you to explore a population of fish for a given disease. Rowing out to the center of the lake, you start catching fish and identifying the presence or absence of the disease for each before throwing them back. Before long, though, you notice a problem: many of the fish are ones you have previously caught. You hypothesize several possible causes for this: Perhaps the disease makes them easier to catch?  Maybe this spot on the lake attracts fish with the disease?  Could there not be as many fish in the lake as we originally thought?  Regardless, you must ask yourself if fishing in this one spot gives you an adequate picture of the overall fish population, or if you should move your boat to other locations. Also, since bait is finite, you need to make your casts count. Where should you focus your efforts in order to best find fish with the disease?

Using predictive analytics, you can first build a model from previous samples to highlight locations on the lake with the highest probability of having sick fish. With the model results in hand, you then row from spot to spot and repeat the same process — catch, identify and release — except now your results from each spot can provide new input for your next model. This feedback loop increases the model’s predictive power.

Understanding Reinforcement Bias

Soon your original problem manifests itself, but now on a larger scale—your model repeatedly sends you to the locations that have been paying off, resulting in an increasingly static fish population. Is your model pointing you toward these spots for any reason other than past success? The feedback loop in your model, while critical to model accuracy, has introduced a reinforcement bias, since it derives from reinforcement learning. Models in feedback loops are very vulnerable to this hidden bias.

A feedback-enabled predictive model typically starts with a small sample of the population of interest. Outputs are labeled as successes or failures for model training, and some type of actionable result (feedback) serves as an input for future model runs. Meanwhile, of course, a model will not “know” why a given subject is of interest (that is, good/bad, positive/negative, and so on)—it simply fits the subject to past “interesting” cases as best it can. This cycle of 1) model prediction, 2) taking action on cases of interest, and 3) learning from feedback, repeats throughout the model’s lifecycle.

Reinforcement Bias Feedback Loop.pngFeedback-enabled models will often show great efficacy in identifying high-interest results, but they have disadvantages. The model can only make predictions based on what it has seen, meaning other interesting cases can exist without detection. In our fishing example, there may exist prime casting spots we didn’t visit simply because our model didn’t have enough information to adequately describe them. Often, the model will make a prediction based solely on the outcome from a previous cycle, leading to repeated scrutiny. This over-scrutiny of some cases means that other cases are neglected, leading to an ever-growing blind spot in the predictive results.

Managing Reinforcement Bias

The fact that a feedback loop leads to reinforcement is effective at first, but then peters out as an area becomes “over fished”. To better manage reinforcement bias employ the following techniques over the model’s lifecycle:

Understand What You’re Looking For

A model does not understand why a given subject is or is not, of interest; i.e., is labeled a 1 or a 0.  This is often a big advantage, in that a model can induce new patterns not thought of or considered by a human expert; but on the other hand, the expert can hypothesize new ideas (or types of “interestingness”) to look for that haven’t yet shown up enough in past data for the model to pick up on.  The domain expert and data scientist should spend time brainstorming such cases before the model is deployed. Subsequent examination of the results may lead to further refinement and expansion of the model’s inputs.

Random Selection During Model Runs

To allow for the chance of finding new fish, spend at least some time casting in new places, even though they appear, at first, unpromising.  Random selection provides the best opportunity to find unpredicted (but interesting) cases. Therefore, policy should dictate that during every model cycle, some subset of the population (say 10%) must receive post-modeling scrutiny, preferably from a domain expert. Results of this review, whether interesting or not, will enhance to the model's predictive power.

Heuristic Subject Selection

Don’t exclude any region too long. Subjects that have gone a certain number of cycles should receive human review at some point.  Look for extremes along any dimension (variable) for potential examination.  Reinforcement bias is most common in cyclical (recurring) review processes.  Pay attention to vetting sequences and consider augmenting a predictive model with a heuristic that helps target unvetted cases.  This will help balance the need to maximize both model accuracy and search coverage.


Repeated sampling in search of interesting cases, and the resulting refining of the guiding predictive models, leads to a positive feedback loop in the modeling, which is powerful, but may “peter out” as the pond becomes over-fished in the initially promising area.  Data scientists must be vigilant of this feedback bias and intentionally make what appear to be sub-optimal choices in order to introduce fresh information into the loop. Work closely with domain experts and front-line analysts to understand the details of the problem they are working on and produce a model, and modeling process, that will prove operationally effective for many cycles of use.

Elder Research has deep experience fielding applications — from product cross-selling to insider threat detection — impacted by the issues described here.  Contact us for a consultation.


Read Mining Your Own Business? Learn Best Practices for Data Analytics Success

Read Uplift Modeling: Making Predictive Models Actionable

About the Author

Nathan French Analyst Nathan French uses his financial analysis experience on projects in the financial risk domain. Previously, he spent eleven years as a 401(k) consultant to approximately 50 small firms specializing in retirement plan design and financial reporting. Nathan received his Bachelor of Science in Mathematics from Cal Poly, San Luis Obispo and a Masters of Science in Bioinformatics from the University of Maryland. His graduate work has included motif finding (a specialization of text mining), prediction via Hidden Markov Models and the measure of gene-gene versus gene-environment interactions to locate quantitative traits. His work at Elder Research has focused on the improvement of statistical models, such as the management of selection bias on both the conceptual and implementation levels.