Data Science consultants build models to address a client’s specific business need, such as a neural network to detect fraudulent applications for government aid. But the data relationships captured may only represent a snapshot in time, and the quality of predictions may degrade over time as the data changes. This could be due to changes in the underlying population generating the input data, new categories emerging in existing data, or macro-economic or regulatory changes, etc. To ensure that the most accurate version of a model is in deployment, the ML Engineer employs a variety of strategies to avoid obstacles, such as data drift.
The challenge of Drift highlights the advantages that the ML Ops framework provides via continuous model retraining and deployment. Drift is a change from the baseline and occurs when a model degrades in quality over time, usually due to a divergence between a model’s training and serving data.
Let’s look at an example for a model trained to predict fraud. And, to dramatize drift, we’ll assume it’s a new (hard) problem, where very little labeled data is available, so the model will need monitoring and maintenance more than most. The figure below depicts the quality of this deployed model over time, where the x-axis is time and the y-axis is the F-1 score (one measure of model prediction accuracy that equally balances the two types of errors – false alarms, and false dismissals).
Fraud model. Source: https://ml-ops.org/content/mlops-principles#monitoring
The model is trained with early known fraud cases (and vastly more common non-fraud) cases, and is initially quite precise out of sample on new data. But its performance decayed rapidly, due to many possible causes — such as data drift, fraudsters coming up with new ways to trick the system, or transactions arriving from companies previously unseen by the model triggering security errors.
The ML Operations framework anticipates drift as a fact of life and prepares for it; engineers set a performance threshold at τ (the green line in the graph), and if the score dips below the line, the model retraining process is triggered, using updated data which includes transactions learned about during the months in which the earlier model was deployed.
Each organization has its unique criteria for setting this performance threshold. Redeploying models can be resource-intensive and divert the Data Science team’s time from other worthwhile tasks. Yet this fraud model could be at the centerpiece of a project and an obsolete model means an obsolete solution to the problem the project is trying to address. What’s at stake? Depending on your task, it could be protecting private citizens’ assets by preventing bank fraud or recovering several millions (or more) of government money by identifying individuals who defrauded pandemic-related loan applications.
The ML Ops pipeline incorporates additional data and model steps into the loop.
The ML Ops framework treats a Machine Learning model as a first-class citizen worthy of constant testing, monitoring, and periodic re-deployment. The framework allows ML-forward organizations to rely on their Data Science products to conduct business by ensuring accurate results on data arriving at a fast pace and in large volumes.
Anti-fraud modeling is a widely adopted area of use, but ML Ops can address a vast set of business challenges, including unemployment insurance, predictive maintenance, public health monitoring, sales forecasting, and e-commerce attribution.