Podcast
ETL vs ELT & Data Lake vs Warehouse: Airbyte, dbt, CDC for Modern Data Engineering
Open original DataTalks.Club episode
ETL vs ELT & Data Lake vs Warehouse: Airbyte, dbt, CDC for Modern Data Engineering
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How do you decide between ETL and ELT, or when to keep a data lake versus a warehouse—and where do tools like Airbyte, dbt, and CDC fit into a modern data stack? In this episode, Natalie Kwong, Growth Product Manager at Airbyte with prior analytics and ops roles at Harness, KeepTruckin, and AppDynamics, pulls from hands-on experience scaling analytics teams and systems to unpack these trade-offs.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:00 - Podcast Introduction
- 1:34 - Episode Overview: Decoding Data Engineering Acronyms
- 1:58 - Guest Career Journey: From Marketing Ops to Analytics & Growth
- 3:19 - Airbyte Overview: ELT Focus and Connector Purpose
- 3:46 - ETL Explained: Extract, Transform, Load (Traditional Model)
- 6:37 - ETL Use Case: Calculating Customer Acquisition Cost
- 7:57 - ELT Advantages: Flexibility, Speed, and Analyst Autonomy
- 10:00 - Transformations in Practice: From Type Casting to Complex SQL Joins
- 12:39 - Analytics Engineer Emergence: Empowering Analysts with DBT & SQL
- 15:30 - Data Marts vs. Warehouses: Purpose, Layers, and Consumption
- 17:55 - Ingestion Layer: Raw Data Storage, Sanity, and Guardrails
- 18:47 - Bringing Transforms Into the Warehouse: ELT vs Legacy Workflows
- 19:50 - Data Lakes: Unstructured Storage for Files, Logs, and Media
- 21:22 - Data Quality: Preventing Data Swamps Through Governance
- 24:24 - Warehouse Ingestion vs. Data Lake: Trade-offs and Convergence
- 27:39 - Architecture Decision: When to Maintain a Lake, a Warehouse, or Both
- 30:59 - Orchestration: Airflow’s Role in Scheduling and Running Pipelines
- 31:31 - Airbyte’s Role in the Stack: Reliable E-L and DBT Integration
- 33:45 - Modern Analytics Stack: Best-of-Breed Tools and Typical Components
- 35:42 - Operational Reverse Data Flows: Pushing Warehouse Tables Back to Sources
- 39:06 - Low-Code/No-Code Tools: Evolving Data Engineering Roles, Not Replacing Them
- 41:30 - ETL’s Continued Relevance: Large Enterprises and Complex Staging Needs
- 43:02 - Managing Unused Data: Team Ownership and Regular Cleanup Practices
- 43:45 - Open Source Strategy: Why Airbyte Is Open and the Cloud Offering Model
- 45:59 - CDC Explained: Capturing and Syncing Only Row-Level Changes
- 48:26 - Open-Source Risks: Competition and Licensing (Elasticsearch Example)
- 48:58 - Schema Evolution: Handling Slowly Changing Attributes
- 49:32 - Licensing Considerations: MIT, Cloud Products, and Future Choices