Fraud, Anomaly Detection, and the Interplay of Supervised and Unsupervised Learning

Peter Bruce

February 8, 2019

 BLOG_Fraud, Anomaly Detection, and the Interplay of Supervised and Unsupervised Learning-2

Mike Thurber, Lead Data Scientist and fraud specialist at Elder Research, presented Elder Research's fraud detection methodology at Predictive Analytics World for Government last year. Consider the scenario of detecting fraudulent insurance claims, such as the audacious "accidental" death scheme in the 1944 noir film Double Indemnity.

Double IndemnityA long-established firm, like the fictional Pacific All Risk Insurance Company of "Double Indemnity" fame, with a long-running life insurance product, will probably have a set of known past fraudulent claims. The characteristics of these frauds can then be used to train statistical models to predict whether a given future claim is likely to be fraudulent.

In many cases, though, an organization does not have a large or well-organized set of "labeled" fraud cases, for example, if the firm or the product is new. This is where Mike's expertise comes in. The first stage of developing a fraud detection capability is anomaly detection - if you do not have known fraud cases as a base, you can at least start by identifying cases that are different from the others (outliers), and merit further investigation.

Enter Supervised Learning

As the organization gains maturity, the investigations of the anomalous cases yield labels - cases are confirmed as fraudulent or not. Domain expertise is used to refine the feature set that describes all cases. These newly labeled cases, and the improved feature set, can then start to be used in statistical models to predict whether a new case is fraudulent or not.

Supervised learning can also be used to identify "confirmed not fraud" cases, which are also labeled as a result of the investigations. So we end up with three categories:

  1. Investigated & confirmed fraud
  2. Investigated and confirmed not fraud
  3. Not investigated

The ones that are not investigated did not qualify sufficiently as outliers to merit investigation. At this point, the organization might refine its anomaly detection model to bring more cases into the labeling process, as more information is learned about features.

This is an excellent example of a holistic data science approach that combines feature engineering, input from domain experts, unsupervised and supervised learning working together, and iterated model improvement informed by a growing body of ground truth.

Need help getting started with analytics? Elder Research's on-site half-day Analytics Executive Strategy session delivers strategies for using analytics to improve organizational decision-making and recommendations on how to grow your analytics capabilities. Learn more. 

Want to dive deeper yourself? offers dozens of fully online courses - learn more


3 Myths About the Normal Distribution

Are We Using Machine Learning?

Evaluate the Validity of Your Discovery with Target Shuffling

About the Author

Peter Bruce Peter Bruce is Founder and President of The Institute for Statistics Education at Previously he taught statistics at the University of Maryland, and served in the U.S. Foreign Service. He is a co-author of Data Mining for Business Analytics, with Galit Shmueli and Nitin R. Patel (Wiley, 3rd ed. 2016; also JMP version 2017, R version 2018, Python version 2019; plus translations into Korean and Chinese), Introductory Statistics and Analytics (Wiley, 2015), and Practical Statistics for Data Scientists, with Andrew Bruce, (O'Reilly 2016). His blogs on statistics are featured regularly in Scientific American online.