Experiment Tracking in MLOps Explained¶
Introduction¶
Experiment tracking records what was trained, with which data, using which code and parameters, and what result it produced. Without tracking, model selection becomes guesswork.
Why This Matters¶
Engineers need experiment records to answer production questions: which run produced the deployed model, what dataset trained it, why was it approved, and what changed from the previous version?
Core Concepts¶
Track parameters, metrics, code version, dataset version, artifacts, environment, and creation time.
Practical Example¶
Even a simple JSON run record is better than no tracking:
import json, subprocess, time
run = {
"git_commit": subprocess.check_output(["git", "rev-parse", "--short", "HEAD"]).decode().strip(),
"data_version": "2026-05-30",
"params": {"max_depth": 8, "n_estimators": 200},
"metrics": {"f1": 0.842, "precision": 0.81, "recall": 0.875},
"artifact": "models/churn-2026-05-30.pkl",
"created_at": time.strftime("%Y-%m-%dT%H:%M:%S"),
}
open("runs/churn-2026-05-30.json", "w").write(json.dumps(run, indent=2))
How This Fits in a Production Workflow¶
Tracking should run automatically inside training, not as a manual note after the fact. CI jobs and training pipelines should fail if metrics or artifacts are missing.
Common Mistakes¶
- Comparing experiments without the same validation split.
- Logging metrics but not data version.
- Saving a model artifact without the run that produced it.
- Letting notebook output be the only experiment record.
Quick Checklist¶
- Are parameters logged?
- Are metrics logged with validation details?
- Is the Git commit recorded?
- Is the model artifact linked to the run?
- Can another engineer reproduce the run?
Related Guides¶
Summary¶
Learn what to track during ML experiments so training runs can be compared, reproduced, and promoted safely.