Wiki
Production ML Project Checklist
Archive-backed checklist for a production ML portfolio project that proves reproducible training, tracked experiments, registry handoff, deployment, monitoring, and rollback criteria.
Related Wiki Pages
Definition
A production ML project proves that a model can move through a repeatable lifecycle. The portfolio should show training and evaluation. It should also show artifact promotion and deployment. It should also show logging, monitoring, and a rule for rollback or retraining.
Use this checklist with Machine Learning Portfolio Projects when the target role is MLOps, ML platforms, or machine learning engineering.
Simon Stiebellehner gives the core lifecycle in Building Production ML Platforms. He moves from training and evaluation at 21:03 to experiment tracking at 29:41. At 30:32, he adds the model registry.
At 31:15, he separates batch and online deployment. At 42:48 and 54:15, metadata and lineage appear with prediction APIs and logging.
Common Definition
Guests treat production ML as an operating loop, not a model score. The loop starts with a business decision and a baseline. It records code and data. It also records parameters and dependencies.
The loop then promotes an artifact. After deployment, it watches service health and input quality, plus prediction behavior and downstream outcomes.
Maria Vechtomova gives the pragmatic standard in Pragmatic and Standardized MLOps. At 18:41 and 22:23, she puts Git and CI/CD in the essential stack. Artifact storage and registries belong there too. Documentation and reproducibility matter, and code quality plus testing matter as well. At 33:24, she moves notebook logic into packages and CI/CD.
Raphael Hoogvliets adds the reproducibility layer in MLOps at Scale. At 39:06, he covers CI and repository structure. He also covers parameterization and tests. At 42:31, he covers data versioning and traceability. He also covers experiment capture.
Guest Differences
For Simon, the platform path turns an experiment into a registered artifact and then into a served model.
Maria’s standardization view supports a lightweight portfolio stack when the lifecycle is clear. The project shouldn’t hide weak delivery behind a large tool list.
Danny Leybzon starts with monitoring. In MLOps Architect Guide, he connects model monitoring to upstream ETL and data pipeline causes at 27:35.
Lina Weichbrodt starts with business value and incident readiness. In Human-Centered MLOps and Model Monitoring, she covers business KPIs at 4:50 and incident prep at 24:34. She covers postmortems at 27:14 and live test sets at 29:23. Feature drift appears at 46:28.
Reproducible Training
The project should record the code version and parameters. It should also record dependency versions, data reference, saved artifact, and run command. Keep configuration separate from code.
That evidence connects to Reproducibility. A simple data snapshot or hash can be enough, and a manifest also works when it lets another person rerun the training job.
Registry And Deployment
Track at least one baseline run and one improved run. Each run should store the dataset reference, parameters, metric values, and artifact path. Keep failure notes too. Then promote one artifact with a registry record.
Nemanja Radojkovic makes the lightweight version acceptable in MLOps in Finance. At 35:57, he describes a simple interim registry. The record still needs model version and data version. It also needs the environment, evaluation result, approval state, and deployment target.
Show either batch scoring or online serving. Batch scoring can write predictions to a table, while online serving can be a small API. The project should include input validation, output schema, logs, and one fallback rule.
Ben Wilson argues for simple, maintainable systems in Practical Machine Learning Engineering for Production. At 6:50 and 8:49, he emphasizes modular and testable code. At 57:38, he describes production ML capstones with tests and monitoring. He also includes A/B testing and CI/CD.
Monitoring And Features
Monitoring should cover service health, input quality, and prediction distributions. It should also cover business outcomes and name upstream causes that could break the model.
Danny links production failures to upstream data pipelines and profiling at 27:35-31:50. Lina adds input shifts and unit changes at 46:28-49:28. She also adds feature drift, logging, and reproducibility. Use Evaluation for metric choices.
Feature-heavy projects should address training-serving consistency. They should also address feature validation, ownership, and drift, plus served-feature logs. Willem Pienaar grounds that in Feature Stores, where he covers feature responsibilities and validation. He also covers ownership and governance.
Related Pages
Use these pages to follow the lifecycle pieces.