Podcast
Building Production ML Platforms: Infrastructure, Workflows, Teams & Governance That Scale
Open original DataTalks.Club episode
Building Production ML Platforms: Infrastructure, Workflows, Teams & Governance That Scale
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How do you design an ML platform that reliably deploys models, tracks experiments, and meets regulatory constraints? In this episode, Simon Stiebellehner — Lead MLOps Engineer at Transaction Monitoring Netherlands and university lecturer in Data Mining & Data Warehousing — walks through practical MLOps platform design grounded in real-world deployment challenges.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 1:14 - Episode Introduction: MLOps & ML platform conversation with Simon
- 2:00 - Career & Transition: Research to industry, early platform work and management
- 4:42 - MLOps Definition: People, processes, and technology
- 6:55 - Deployment Challenges: Early blockers that launched MLOps work
- 8:11 - Core Platform Skills: Cloud infrastructure, Kubernetes, Terraform
- 10:47 - User-Centric Platform Design: Understanding data science workflows and notebooks
- 13:25 - Engineering Fundamentals: Software engineering for ML platforms
- 13:50 - Team Composition: Specialist vs generalist skill balance
- 15:34 - Team Size & On-Call: Staffing and operational considerations
- 16:52 - Build vs Buy Decision: When to consider building an ML platform
- 17:14 - Platform Triggers: Signs you need standardization across teams
- 20:04 - Single-Team Value: SaaS components and incremental platform adoption
- 21:03 - Data Science Workflow: Exploration to training and evaluation
- 28:20 - Self-Service Compute: Notebooks, BigQuery, Databricks provisioning
- 29:41 - Experiment Tracking: Low-hanging fruit for reproducibility and collaboration
- 30:32 - Model Registry: Persisting models for downstream consumption
- 31:15 - Deployment Patterns: Batch inference versus online serving
- 31:51 - Orchestration Choices: Airflow, pipelines, and production workflows
- 34:01 - Tool Integration: Stitching SaaS and open-source into a coherent platform
- 35:26 - LLMs & Emerging Needs: Platform implications and vendor updates
- 38:40 - Developer Experience: Thin abstraction layers over cloud providers
- 39:54 - Regulatory Constraints: Fintech, security, and compliance impact
- 42:48 - Metadata & Lineage: Reproducibility, artifact logging, and tracking
- 45:50 - Data Governance: GDPR implications of logging and dataset storage
- 47:08 - Business-First Strategy: Models before heavy platform investment
- 49:19 - Parallelization Strategy: Building minimal platform pieces alongside use
- 51:41 - MLOps Skill Focus: When platform engineers should learn model internals
- 54:15 - API Design & Logging: Unified prediction schemas for monitoring and analytics
- 57:32 - Learning Resources: Books, practical projects, and MLOps training