Wiki
ELT
Podcast-grounded guide to ELT as a load-first data pipeline approach for warehouses, dbt transformations, analytics engineering, orchestration, CDC, quality checks, and governed data marts.
Related Wiki Pages
ELT means extract, load, transform. A team extracts data from source systems and loads it into analytical storage. Then it transforms the data inside the warehouse, lakehouse, or adjacent SQL engine. In the DataTalks.Club archive, ELT usually sits inside the modern data stack.
An ingestion tool loads source data, and a data warehouse keeps raw and modeled tables. dbt or plain SQL turns those tables into business-facing models.
Natalie Kwong gives the archive’s clearest definition in ETL vs ELT and the Modern Data Stack. At 7:57-14:54, she explains why teams load first when business logic changes often. Source detail stays available. Analysts can write new SQL transformations. Data engineers don’t need to re-extract a source every time a new field or question appears.
Use ETL vs ELT for the full reference comparison and ETL for the transform-before-load side. Use the shorter ETL vs ELT decision guide when the choice is the question.
Definition and Scope
ELT changes where business meaning gets created. In ETL, the pipeline applies business logic before it writes to the destination. In ELT, the destination receives raw or lightly processed data first. The team then builds typed, joined, cleaned, and documented tables from that stored data. Aggregations come from the same stored layer.
Kwong describes this as splitting the E-L work from the T work. Airbyte
handles extraction and loading, while transformations happen after data arrives
in the warehouse. Her examples range from simple type casting to joining
AdWords and Salesforce data into a final business model
(modern stack episode at 10:00-12:39).
That puts ELT near data pipelines
because the pipeline still has to move data and transform it. The team also has
to publish the result and keep it reliable.
The archive doesn’t treat ELT as “load everything and forget about it.” Kwong separates raw ingestion from data marts at 15:30-18:47. Rahul Jain describes the same move at platform scale in Data Engineering Leadership. His team moved from tightly coupled ETL models to ELT. They could then load data first, transform it later, and keep the model resilient as use cases grew (30:50-33:15).
Warehouse Layers
ELT works when the warehouse has clear layers. Kwong describes an ingestion database as the rawest form of data from a connector such as Airbyte. Teams may then build a common layer that several groups can reuse, followed by data marts for business consumers. Those marts may serve marketing, sales, finance, or product teams. After transformation, business users can pull metrics from a mart because the team has added guardrails and consistent definitions (modern stack episode at 15:30-18:47).
Santona Tuli adds a more pipeline-oriented version in Modern Data Pipeline Architecture. At 32:57-39:23, she describes staging as a holding area between source systems and the warehouse or lakehouse. Some tools hide that stage, but the boundary still matters. Data may be staged and checked. The ingestion tool may also deduplicate records, enforce ordering, and mask fields before human-facing SQL work begins.
The modeled layer is where ELT becomes useful to the business. Tuli frames this as mapping keys, entities, and business questions after data arrives in the warehouse or lakehouse (39:23-43:05).
Nikola Maksimovic shows the analytics version in From Marketing to Analytics Engineering. His team used dbt to model a domain and migrate transformation work. They also made decisions about wide and narrow tables plus incremental strategies (18:34-33:46).
Tools and Boundaries
In ELT, Airbyte, dbt, and Airflow do different jobs. Kwong places Airbyte at the extract-load step and connects it with dbt after warehouse load (modern stack episode at 31:31-33:45). That makes Airbyte an ingestion tool in this page’s vocabulary, not the owner of the warehouse transformation layer.
Victoria Perez Mola explains the transformation side in Analytics Engineer Skills and Tools. At 4:05-10:04, she describes dbt as the place where analytics engineers write SQL models, documentation, and tests. dbt also tracks model dependencies. Snowflake runs the queries in her example stack, while Looker consumes the modeled result. This is the archive’s clearest link between ELT and analytics engineering.
Airflow belongs at the scheduling and dependency boundary. Kwong says Airflow is an orchestrator that can run Airbyte jobs. It isn’t the transformation layer (modern stack episode at 30:59-31:31).
Tuli makes the same boundary from the workflow-authoring side because Airflow, Prefect, or another orchestrator may coordinate work. Ingestion engines, warehouses, dbt, and modeling tools still own the work they run (pipeline architecture episode at 7:08-10:48 and 26:43-29:16).
Adrian Brudaru widens the tool choice in Modern Data Engineering Trends. He places dbt next to newer workflow options and open table formats. He also places it next to catalogs, metadata, and lineage. His point for ELT is that teams still need SQL and Python. They also need requirements work and tool judgment even when the stack uses newer lakehouse or AI-assisted components (21:27-35:37 and 41:06-44:42).
Schema, Quality, and Governance
ELT preserves source detail, but it also creates governance work. Kwong’s Salesforce example shows why teams load first. A new checkbox or picklist field can be ingested and modeled later. The team doesn’t need a full extraction redesign (modern stack episode at 7:57-12:39 and 48:58-49:32). That same flexibility can create unused raw data, unclear ownership, and inconsistent definitions if teams don’t maintain the warehouse layers.
CDC is one ingestion technique that supports ELT. Kwong describes change data capture as syncing only changed records after an initial load. It includes changed or deleted rows instead of copying the whole source table again (modern stack episode at 45:59-48:26). CDC helps keep the loaded layer fresh, but the team still has to decide how those changes affect staged tables, modeled dimensions, and downstream marts.
Quality checks belong both before and after loading. Tuli says ingestion tools may deduplicate, enforce ordering, and apply PII masking before data reaches Snowflake or another destination (pipeline architecture episode at 37:10-39:23). Perez Mola then shows the warehouse-side checks. dbt tests can query for nulls, range violations, and duplicate records before dependent models build. The same test layer can catch bad source data (analytics engineering episode at 36:44-40:42).
Jain’s platform episode adds controls. His team tracked data quality metrics, reconciled source counts against warehouse or lake targets, and used dynamic data masking with role-based access. They also maintained lineage when raw and modeled data changed (data engineering leadership episode at 25:04-33:15).
That connects ELT to DataOps because teams need versioned code, tests, and lineage. They also need observability and repeatable runs, not only a load-first diagram.
Operating the ELT Stack
ELT shifts some work from data engineers to analytics engineers and analysts, but it doesn’t remove engineering work. Kwong says ELT gives analytics teams more autonomy. Many transformations can be written in SQL after data is already in the warehouse (modern stack episode at 12:39-14:54). Perez Mola’s daily work requires data modeling, pipeline awareness, and data quality. It also requires Looker work, dbt tests, and collaboration with backend and data engineering teams (analytics engineering episode at 4:05-14:34 and 33:02-40:42).
Maksimovic’s episode keeps the role from becoming tool worship. He says dbt influenced analytics engineering, but the deeper skill is understanding data model architecture, business domains, and KPIs. Table design and incrementalization choices matter too (marketing-to-analytics episode at 28:40-33:46). That’s why an ELT stack can use dbt, a homegrown SQL runner, or another warehouse modeling layer and still face the same modeling questions.
Gloria Quiceno connects the career and project side in her data engineering job story. Her discussion ties pipelines to Docker and Airflow. It also covers AWS runs, warehouse-specific SQL, clean data, and quality checks. That episode is less about the ELT acronym. It focuses on the operational skills that make a pipeline reproducible and useful to business analysts (21:25-36:20 and 50:15-53:34).
Guest Differences
Kwong’s modern analytics argument is to load source detail, keep the warehouse flexible, and let analytics teams transform with SQL and dbt. She still warns that raw layers need governance, cleanup, and data mart boundaries (modern stack episode at 7:57-18:47 and 43:02-45:59).
Perez Mola argues from the analytics engineer’s desk. She focuses less on the ELT acronym and more on models, tests, and DAGs. She also emphasizes bad data, schema changes, and raw-input limits (analytics engineering episode at 36:44-48:36).
Tuli draws a sharper boundary around ingestion, and she doesn’t treat every pre-load action as business transformation. Deduplication, ordering guarantees, and masking can happen near ingestion. Business modeling happens later with warehouse entities and use cases (pipeline architecture episode at 37:10-43:05).
Jain argues from scale and leadership because fixed target models became too tightly coupled as use cases grew. His team moved from ETL to ELT, but they kept traditional and flat models. They also used a data lake, lineage, and consumer-facing exposure paths (data engineering leadership episode at 30:50-33:15 and 57:29-57:56).
Brudaru updates the stack boundary. For him, ELT now sits alongside open table formats, catalogs, and metadata. Orchestration choices and requirements-led tool selection also matter (modern data engineering trends at 18:17-35:37 and 41:06-44:42). That makes ELT a durable workflow structure rather than a fixed list of vendors.