Podcast
Modern Data Pipeline Architecture: Ingestion, Orchestration, Transformation & MLOps Systems
Open original DataTalks.Club episode
Modern Data Pipeline Architecture: Ingestion, Orchestration, Transformation & MLOps Systems
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How do you build a modern data pipeline that reliably moves raw events through ingestion, dbt transformations, Airflow orchestration and into production ML and analytics? In this episode, Santona Tuli — a former CERN researcher turned ML and data engineering lead at Upsolver — walks through practical patterns and trade-offs for end-to-end pipelines. Drawing on experience from particle-physics event analysis to NLP and workflow authoring with Airflow, Santona explains where ingestion engines and declarative SQL.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:00 - Episode Introduction
- 1:30 - Career journey: CERN researcher → NLP, ML engineering, Python, Astronomer,
- 7:08 - Transition to workflow authoring and orchestration (Airflow, Astronomer)
- 10:48 - Upsolver vs DBT: pipeline authoring, execution engine, and ingestion focus
- 13:25 - Comparing ML pipelines and analytics data pipelines
- 18:44 - MLOps vs DataOps: operationalizing models vs business data
- 24:57 - Analytics engineering and DBT’s role in the modern data workflow
- 26:43 - Tooling landscape: orchestrators, Spark, Kafka/Kinesis, feature stores, vector
- 29:16 - Modern data stack choices: Upsolver, Snowflake, Databricks, build vs buy
- 32:57 - Data staging and lakehouse patterns; managed ingestion hiding the stage
- 37:10 - Ingestion pre-processing: deduplication, ordering guarantees, PII masking
- 39:23 - Transformation and data modeling: entities, foreign keys, and business mappings
- 43:05 - Marts, dashboards and translating business questions into metrics
- 44:57 - ML pipeline specifics: feature engineering, model training, and serving
- 47:57 - Translating academic data/physics skills to industry pipelines
- 52:54 - Persona-driven pipeline design and real use-case examples
- 55:56 - Career advice: value of being a generalist and closing skill gaps
- 56:49 - Learning strategy: vetting sources, networking, and engineering blogs
- 59:16 - Recommended resources: Fundamentals of Data Engineering, Airflow guides,