Goodhart’s Law, Evolving Threats, and Model Monitoring

Stuart Price, Ph.D.

March 9, 2018

BLOG_Goodhart’s Law, Evolving Threats, and Model Monitoring.jpg

In 1975 Charles Goodhart, a chief economic advisor of the Bank of England, posited “When a measure becomes a target, it ceases to be a good measure”. This idea came to be known as Goodhart’s Law and is recognized as a risk associated with key performance indicators (KPI) and implementing analytics. Any metric applied to a competitive or adversarial system will change behavior if it is perceived to make decisions that affect the system. If your adversary has a good chance of figuring out your metric, how can you keep your system from being gamed? 

Whether it is schools vying for a better ranking or criminals trying to commit fraud, people respond to incentives. Take for example the ranking of colleges and universities by various publications to identify the “best”. These rankings influence a school’s ability to draw the best students. One metric that is often used is the institution’s acceptance rate. The acceptance rate reflects the institutions ability to attract students, who College Admissions Application.jpgare responding to quality signals about the institution. Some institutions, knowing they are being judged, have instituted a two-stage application process: the first requiring little effort on the student’s part and the second a more complete application with essays and letters of recommendation. By changing the application process, and lowering the barrier to apply, more students participate in the first stage of the application process, which is then reported as the total number of applications. By inflating the number of applications, the school can artificially decrease its acceptance rate, without having to become a better school. At these institutions the underlying systems of admissions has been altered, changing the process by which the data is generated; thus, a new model is required in order to not be fooled.

Machine learning offers an improvement over traditional KPIs when dealing with Goodhart’s law for several reasons, including its ability to integrate multiple data sources and capture more complex interactions. These strengths, however, can create new vulnerabilities to sophisticated adversaries if you are not careful. Important questions to explore include:

  • What are machine learning’s advantages over traditional KPI? 
  • What can we do during model deployment to ensure resilience to adversaries?
  • How can agile data science and model monitoring keep a model fresh?

Machine Learning for Metrics

The biggest advantage of machine learning when generating metrics is its ability to discover the relationship between many different features. A single metric is easy enough to game, especially as it is Machine Learning_small.jpgeasier to identify, whereas it is hard to game multiple interacting metrics.  When we consider inputs such as SAT scores and acceptance rate, we might find that while more selective schools tend to do better, that if a school becomes more selective without improving its average SAT scores that this constitutes less of a signal. By using more information, a more accurate model of the school’s quality can be painted, capturing the interactions between the features, and reducing the benefit from manipulating any one input.

Though we may be able to reduce the sort of competitive gamesmanship we see in college rankings by using a more sophisticated model there are also new potential vulnerabilities created.  Though these are not as intuitively obvious as gaming an individual metric, they could be discovered by an adversary probing for weaknesses. For example, someone looking to get an email past a spam filter could automate the search of vulnerabilities by trying combinations of vocabulary and domain name. Probing is low risk and low cost for the spammer allowing them to search exhaustively. They may discover an email for a “miracle cure” might pass if sent from an ‘org’ domain (which includes nonprofits such as churches and hospitals), but not an ‘edu’ domain (which includes colleges and universities). Often these weaknesses occur where there are fewer observations in the training data or a non-representative/ biased sample is used for training the model.

Model Sensitivity and Regularization

Under adversarial conditions how do we ensure the efficacy of our models and interventions? Prior to deploying a model, it is important to determine that the model is not overly sensitive to small changes in the inputs. This is particularly true for combinations of inputs for which you have seen fewer examples. Reducing sensitivity to small perturbations that can change the output improves the model’s resilience [1].

One way to reduce sensitivity and improve model robustness is by simplifying the solution using regularization [2]. Regularization applies a penalty for model complexity, only accepting parameters which have a significant improvement to model performance. Reducing the complexity tends to have the effect of smoothing the response surface and improving generalization.

Agile Deployment and Model Monitoring

Continue to monitor the model’s predictive performance once it is in production; this is important for all models, but is critical to success in adversarial conditions. All models age. The underlying Agile Development.jpgdistributions change, processes and technologies evolve, and many other factors cause a model’s accuracy to decrease over time, but under adversarial conditions these changes can be swift and directed in ways to circumvent the model. The rate at which these changes occur, and to which you must be prepared to respond, is context dependent. Spammers can recode their spambots in hours, while individuals filing fraudulent insurance claims would likely take much longer to adapt. Continually tracking the distributions of inputs and the accuracy of a model allows managers to refresh or re-create it at the right interval. An agile development cycle is particularly well suited for responding to evolving threats. One paradigm for keeping a model fresh is a Champion-Challenger system, in which challenger models are created to surpass the performance of the champion model that is currently in production and being used to make decisions.


Predictive analytics and forecasting has become an invaluable tool for businesses to make the best decisions, but care and maintenance is required to keep these tools in optimal working condition. By seeing that your model is using a variety of inputs, is not excessively sensitive, and is monitored for continued performance, you will be sure to the most out of it.

Detecting fraud is a great example of an adversarial system in analytics where detection techniques must evolve to keep ahead of the ever changing threat. Watch our on-demand webinar to learn more about the Best Practices for Deploying a Fraud Analytics Solution

[1] McDaniel, Patrick, Nicolas Papernot, and Z. Berkay Celik. "Machine learning in adversarial settings." IEEE Security & Privacy 14.3 (2016): 68-72.

[2] Barreno, Marco, et al. "Can machine learning be secure?" Proceedings of the 2006 ACM Symposium on Information, computer and communications security. ACM, 2006. 

About the Author

Stuart Price, Ph.D. Data Scientist Dr. Stuart Price has experience applying machine learning, optimization, and simulation to problems in healthcare. He enjoys working in new fields and learning from subject matter experts. His research interests include applying machine learning and optimization to create decision support tools to be used in healthcare and predictive confidence. Stuart earned a BA in Physics and Mathematics from Hendrix College, and an MS in Applied Mathematics and a PhD in Operations Management from the University of Maryland, College Park.