Podcast

Building Production ML Platforms: Infrastructure, Workflows, Teams & Governance That Scale

S14E8

Open original DataTalks.Club episode

YouTube Spotify Apple Podcasts

MLOps machine learning leadership career growth

Building Production ML Platforms: Infrastructure, Workflows, Teams & Governance That Scale

Original Episode

Use these links for the canonical episode and media sources.

Open the original DataTalks.Club podcast page
Watch on YouTube
Listen on Spotify
Listen on Apple Podcasts

Episode Overview

How do you design an ML platform that reliably deploys models, tracks experiments, and meets regulatory constraints? In this episode, Simon Stiebellehner — Lead MLOps Engineer at Transaction Monitoring Netherlands and university lecturer in Data Mining & Data Warehousing — walks through practical MLOps platform design grounded in real-world deployment challenges.

People

Use these links to connect the episode to guest notes.

Simon Stiebellehner

Chapter Summary

Use these checkpoints to decide whether to open the source transcript.

1:14 - Episode Introduction: MLOps & ML platform conversation with Simon
2:00 - Career & Transition: Research to industry, early platform work and management
4:42 - MLOps Definition: People, processes, and technology
6:55 - Deployment Challenges: Early blockers that launched MLOps work
8:11 - Core Platform Skills: Cloud infrastructure, Kubernetes, Terraform
10:47 - User-Centric Platform Design: Understanding data science workflows and notebooks
13:25 - Engineering Fundamentals: Software engineering for ML platforms
13:50 - Team Composition: Specialist vs generalist skill balance
15:34 - Team Size & On-Call: Staffing and operational considerations
16:52 - Build vs Buy Decision: When to consider building an ML platform
17:14 - Platform Triggers: Signs you need standardization across teams
20:04 - Single-Team Value: SaaS components and incremental platform adoption
21:03 - Data Science Workflow: Exploration to training and evaluation
28:20 - Self-Service Compute: Notebooks, BigQuery, Databricks provisioning
29:41 - Experiment Tracking: Low-hanging fruit for reproducibility and collaboration
30:32 - Model Registry: Persisting models for downstream consumption
31:15 - Deployment Patterns: Batch inference versus online serving
31:51 - Orchestration Choices: Airflow, pipelines, and production workflows
34:01 - Tool Integration: Stitching SaaS and open-source into a coherent platform
35:26 - LLMs & Emerging Needs: Platform implications and vendor updates
38:40 - Developer Experience: Thin abstraction layers over cloud providers
39:54 - Regulatory Constraints: Fintech, security, and compliance impact
42:48 - Metadata & Lineage: Reproducibility, artifact logging, and tracking
45:50 - Data Governance: GDPR implications of logging and dataset storage
47:08 - Business-First Strategy: Models before heavy platform investment
49:19 - Parallelization Strategy: Building minimal platform pieces alongside use
51:41 - MLOps Skill Focus: When platform engineers should learn model internals
54:15 - API Design & Logging: Unified prediction schemas for monitoring and analytics
57:32 - Learning Resources: Books, practical projects, and MLOps training