Machine Learning & MLOps
Tools, practices, and challenges in ML engineering and operations.
How many ML models do you currently have in production?
45% have 2–5 models and 21% have 5+, so about two-thirds have multiple models in production. 17% have none and 17% have just one; plenty of teams are still in early or experimental mode.
Which tools do you use for deploying ML models?
Azure ML (34%) and Kubernetes (31%) lead, with AWS SageMaker (28%) close behind. 17% don't deploy models at all. Cloud platforms and K8s are the go-to; TensorFlow Serving, MLflow, and Databricks show up but at lower rates.
Do you use any tools to monitor ML models in production?
37% use Prometheus and Grafana,the classic observability stack. 30% don't monitor models at all, which is risky. Custom scripts (22%) and ELK (15%) are common; Evidently and WhyLabs are used by a smaller slice.
Which tools do you use for model training and experimentation?
MLflow dominates at 61%,it's the default for experiment tracking. 32% don't use dedicated tools (notebooks and scripts instead). TensorBoard, W&B, Kubeflow, and framework-specific setups show up at lower percentages.
Which tools do you use for model or data versioning?
MLflow again leads at 65%; it's the standard for model and experiment versioning. 35% don't use versioning tools. Git and DVC are used by a small share,versioning is still under-adopted compared to training tools.
Which workflow orchestration tools do you use for ML pipelines?
Airflow is on top at 58%,it's the default for pipeline orchestration. 23% don't use orchestration tools. Prefect (15%) and Dagster (12%) are next; Kestra, Kubeflow, and AWS Step Functions also appear.
Which CI/CD tools do you use for ML workflows?
GitLab CI/CD leads at 50%, with MLflow (32%) often used in the ML loop. 25% don't use CI/CD for ML. GitHub Actions (14%) and Jenkins (7%) are the other common options,ML CI/CD is still catching on.
Do you use any feature stores?
63% don't use feature stores,they're not mainstream yet. Among those who do, AWS SageMaker (17%), Databricks (13%), and Vertex AI (13%) lead. Custom and Feast show up at low percentages.
How often do you retrain your models in production?
48% don't retrain,models are often deployed and left as-is. 28% retrain when performance drops and 20% on a schedule (weekly, monthly). Only 4% do continuous/online learning. Retraining is a clear gap for many teams.
Where do you run your ML workloads?
AWS (46%) and Azure (38%) are the top clouds; 38% also use on-prem. GCP is at 19%. Many use a mix of cloud and on-prem, hybrid and multi-cloud are common for ML.
How many people are in your ML team(s)?
Most teams are small: 48% have 1–5 people and 30% have 6–10. 7% have no dedicated ML team (0). Larger teams (21–50, 51+) are a minority, ML is often owned by small, focused groups.
Do you have a centralized MLOps team?
68% don't have a dedicated MLOps team,ML and MLOps are usually embedded in product or data teams. The 32% with a centralized team are often bigger orgs that have invested in MLOps as a function.
How would you describe your MLOps maturity?
33% have standardized deployment and monitoring; 30% have some production models and 30% are mostly manual or experiments-only. Only 7% mention advanced MLOps (CI/CD, automated retraining, clear ownership). Maturity is spread out, no single dominant stage.
For the ML/MLOps tools you use, how would you describe their role?
36% say experimental/pilot only; 32% use them regularly but not critically, and 32% say they're mission-critical. It's an even split,tools are either critical or still in exploration for most teams.
Which ML or MLOps tools do you plan to adopt or expand in the next 12 months?
Plans are fragmented, each option is ~8% (only 12 respondents). MLflow, Airflow, Prefect, Kestra, Feast, Kubeflow, Azure ML, W&B, and CI/CD come up. People are still exploring; no single tool dominates the roadmap.
What are your biggest challenges in ML engineering and MLOps?
Deployment complexity (69%) and lack of skills (54%) are the top two. Monitoring (46%), data quality (35%), and scaling pipelines (35%) follow. Integration (31%), compliance (27%), and cost (23%) round it out, getting models live and keeping them healthy is the main pain.