Model Drift Explained¶
Introduction¶
Model drift is performance degradation after deployment. It is different from data drift: data can change without immediate proof of worse accuracy, while model drift is about prediction quality.
What Can Go Wrong in Production¶
Ground truth labels may arrive days or weeks later. By the time the team sees degraded recall or precision, users may already be affected.
Key Metrics to Monitor¶
Monitor delayed accuracy, precision, recall, F1, business KPIs, prediction distribution, calibration, segment-level performance, and manual override rates.
Practical Example¶
When labels arrive, evaluate production predictions by model version:
from sklearn.metrics import f1_score
y_true = [1, 0, 1, 1, 0, 0]
y_pred = [1, 0, 0, 1, 1, 0]
print({"model_version": "churn:17", "f1": round(f1_score(y_true, y_pred), 3)})
{'model_version': 'churn:17', 'f1': 0.667}
Detection Strategy¶
Join predictions with later labels, compare to the approved baseline, and retrain only when evidence justifies it. For critical models, evaluate by segment.
Common Mistakes¶
- Confusing data drift with proven model drift.
- Waiting for overall accuracy while a key segment degrades.
- Retraining automatically without checking data quality.
- Not storing prediction IDs needed to join labels later.
Quick Checklist¶
- Are predictions logged with IDs and model version?
- Can labels be joined later?
- Are metrics compared to baseline?
- Are segment-level metrics available?
- Is retraining reviewed and tested?
Related Guides¶
Summary¶
Learn how model performance can degrade over time and how monitoring, labels, and retraining help manage model drift.