Podcast
Modern Data Engineering: Iceberg, Delta Lake & AI-Powered Pipelines
Open original DataTalks.Club episode
Modern Data Engineering: Iceberg, Delta Lake & AI-Powered Pipelines
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How can engineering teams build reliable, scalable lakehouse pipelines that combine transactional table formats with AI-driven automation? In this episode Adrian Brudaru—an economics-trained analyst turned freelance data practitioner and co-founder of a data company focused on open source tooling—joins us to explore the realities of modern data engineering.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:01 - Episode opening & guest introduction
- 2:23 - Perspective on evolving data engineering challenges
- 3:10 - Career journey: startups, freelancing, founding DLT
- 4:03 - DLT as a Python-based ingestion standard and market impact
- 7:45 - DLT Plus vision and partnership outreach for freelancers
- 11:03 - Industry shift toward specialization: governance, data quality, streaming
- 12:37 - Early-career opportunities: AI projects and startup hiring
- 14:32 - Modern data stack critique and open-source “postmodern” alternatives
- 16:40 - 2025 trends: AI integration in data engineering and Apache Iceberg adoption
- 18:17 - Apache Iceberg explained: table format, Parquet storage, vendor lock-in reduction
- 21:27 - Database layers and catalog role: storage, compute, access, metadata & lineage
- 23:41 - Metadata and catalog tooling overview (AWS Glue and peers)
- 25:58 - DuckDB impact: embeddable local OLAP and portable query engine
- 27:40 - Cost-efficient pipelines: DuckDB with GitHub Actions and headless table formats
- 30:31 - Headless table formats and DLT support for Delta Lake and Iceberg
- 31:29 - dbt’s influence on engineering workflows and alternatives like SQLMesh
- 35:37 - Workflow orchestration options in 2025: Airflow, Prefect, Dagster, GitHub
- 38:02 - AI engineering convergence: data engineers building AI agents
- 41:06 - Beginner roadmap: SQL, Python, capturing business requirements, building
- 44:42 - Tool selection guidance and vendor caution for modern data stacks
- 45:56 - Transition paths: senior backend engineers moving into data engineering
- 48:04 - Job market outlook: senior vs junior data engineering opportunities
- 49:42 - Table format comparisons: Delta, Hudi, and Iceberg differences
- 51:19 - Streaming architectures and tools: micro-batching, Kafka, SQS, Flink
- 56:15 - AI-driven commoditization and code generation in data engineering
- 59:42 - DLT roadmap: DLT Plus and a marketplace for reusable data products
- 1:01:19 - Episode wrap-up and key takeaways