Back to Overview2024–2025

Machine Learning & MLOps

Tools, practices, and challenges in ML engineering and operations.

How many ML models do you currently have in production?

45% have 2–5 models; 25% have none and 18% have one. 12% have 5+. Many teams were still early in production in 2024–2025.

Which tools do you use for deploying ML models?

38% don't deploy models. Among those who do, Kubernetes and SageMaker (27% each) lead; Google AI Platform (22%) and Azure ML (18%) follow. TensorFlow Serving (10%) is next.

Do you use any tools to monitor ML models in production?

58% don't monitor models. Prometheus and Grafana (21%), custom scripts (11%), and ELK (9%) are the most used. Monitoring was a clear gap.

Which tools do you use for model training and experimentation?

55% don't use dedicated tools; MLflow (34%) leads among those who do. W&B (13%) and TensorBoard (10%) follow. Many relied on notebooks or scripts.

Which tools do you use for model or data versioning?

58% don't use versioning tools. MLflow (32%) leads; W&B (11%) and DVC (11%) have smaller shares. Versioning was under-adopted.

Which workflow orchestration tools do you use for ML pipelines?

54% don't use orchestration. Airflow (34%) dominates; Step Functions (8%), Kubeflow (7%), and Prefect (6%) follow. Orchestration was not yet widespread.

Which CI/CD tools do you use for ML workflows?

50% don't use CI/CD for ML. GitLab CI/CD (27%) and Jenkins (15%) lead. Traditional DevOps tools dominated over ML-native pipelines.

Do you use any feature stores?

75% don't use feature stores. SageMaker (12%), Databricks (11%), and Vertex AI (8%) lead among adopters. Feature stores were not mainstream.

How often do you retrain your models in production?

44% don't retrain; 29% retrain when needed and 23% on a schedule. Only 3% do continuous learning. Retraining was mostly reactive.

Where do you run your ML workloads?

AWS (40%), Azure (29%), and GCP (21%) lead; 21% use on-premise and 11% hybrid. Cloud dominated but on-prem and hybrid were still common.

How many people are in your ML team(s)?

45% have 1–5 people and 35% have 6–10. 10% have no dedicated team (0). Small teams were the norm; few had 11+ or 51+.

Do you have a centralized MLOps team?

81% don't have a dedicated MLOps team; ML operations were mostly distributed. Only 19% had a centralized team.

How would you describe your MLOps maturity?

35% have some production models but mostly manual; 30% are experiments-only. 28% have standardized deployment and monitoring; 7% advanced. Maturity was spread out.

For the ML/MLOps tools you use, how would you describe their role?

39% say experimental only; 31% use them regularly but not critically, and 30% say mission-critical. Tools were either critical or still in exploration.

Which ML or MLOps tools do you plan to adopt or expand in the next 12 months?

MLflow(23%)Apache Airflow(18%)Kubernetes / Docker(13%)Feature stores(10%)Model monitoring(9%)Prefect / Dagster(8%)

MLflow (23%) and Airflow (18%) lead adoption plans; Kubernetes/Docker (12%), feature stores (10%), and monitoring (9%) follow. Plans were diverse.

What are your biggest challenges in ML engineering and MLOps?

Deployment complexity(60%)Lack of skills or expertise(50%)Monitoring and observability(42%)Data quality(35%)Scaling ML pipelines(33%)Integration with existing systems(30%)Compliance / governance / ethics(25%)Cost / infrastructure constraints(23%)

Deployment complexity (60%) and lack of skills (50%) are the top two. Monitoring (42%), data quality (35%), and scaling (33%) follow. Same pain points as today.