Wiki

ELT

ELT as a load-first pipeline setup for warehouses, dbt transformations, analytics engineering, CDC, quality checks, and governed marts.

Related Wiki Pages

ETL Modern Data Stack Data Warehouse Data Pipelines dbt Analytics Engineering CDC DataOps

ELT means extract, load, transform. A team extracts data from source systems and loads it into analytical storage. Transformation then runs inside a data warehouse, lakehouse, or adjacent SQL engine.

ELT usually sits inside the modern data stack. In that stack, an ingestion tool writes raw source data to storage. dbt or plain SQL builds models. BI tools consume governed tables.

Teams load first when business logic changes often because source detail stays available and analysts can write new SQL transformations. Data engineers don’t need to re-extract a source every time a new field or question appears. ^[1]

ELT covers load-first data movement and warehouse-side transformation after data is loaded. ETL covers transform-before-load work, and the ETL vs ELT comparison owns the choice between the two patterns. Data Pipelines covers the full data flow and operating lifecycle.

Load-First Model

ELT changes where business meaning gets created because the destination receives raw or lightly prepared data first. The team then builds typed, joined, cleaned, and documented tables from that stored data. Aggregations come from the same stored layer.

ETL vs ELT covers the tradeoff with transform-before-load. This hub follows the load-first model after that choice is made.

The modern stack splits E-L from T, with Airbyte handling extraction and loading. Transformations happen after data arrives in the warehouse. They range from simple type casting to final business models that join AdWords and Salesforce data. ^[1] ELT is still a data pipelines topic because the pipeline has to move, transform, publish, and operate data reliably.

ELT isn’t “load everything and forget about it.” Raw ingestion and data marts are separate layers.^[1] At platform scale, teams can move from tightly coupled ETL models to ELT. They can load data first, transform it later, and keep the model resilient as use cases grow.^[2]

Warehouse Layers and Marts

ELT works when the warehouse has clear layers. An ingestion database keeps the rawest form of data from a connector such as Airbyte. Teams may then build a common layer that several groups can reuse, followed by data marts for business consumers. Those marts may serve marketing, sales, finance, or product teams. After transformation, business users can pull metrics from a mart because the team has added guardrails and consistent definitions. ^[1]

Staging gives the pipeline a holding area between source systems and the warehouse or lakehouse. Some tools hide that stage, but the boundary still matters. Data may be staged and checked. The ingestion tool may also deduplicate records, enforce ordering, and mask fields before human-facing SQL work begins.^[3]

ELT becomes useful to the business when teams map keys, entities, and business questions after data arrives in the warehouse or lakehouse. ^[3]

Analytics teams use dbt to model a domain and migrate transformation work. The same modeling work also includes decisions about wide and narrow tables plus incremental strategies.^[4]

At platform scale, fixed target models can become too tightly coupled as use cases grow. Teams may keep traditional and flat models alongside lineage, a data lake, and consumer-facing exposure paths. ^[2]

Tool Boundaries

In ELT, Airbyte, dbt, and Airflow do different jobs. Airbyte sits at the extract-load step and connects with dbt after warehouse load. ^[1] Airbyte is an ingestion tool in this page’s vocabulary, while the warehouse transformation layer belongs to SQL, dbt, or another modeling system.

On the transformation side, analytics engineers write SQL models, documentation, and tests in dbt. dbt also tracks model dependencies. Snowflake runs the queries in one example stack, while Looker consumes the modeled result. This connects ELT directly to analytics engineering. ^[5]

Airflow belongs at the scheduling and dependency boundary. Airflow can run Airbyte jobs, but it isn’t the transformation layer. ^[1]

Airflow, Prefect, or another orchestrator may coordinate the work. Ingestion engines, warehouses, dbt, and modeling tools still own the work they run. ^[3]

The tool boundary widens when dbt sits next to newer workflow options and open table formats such as Delta Lake. It also sits next to catalogs, metadata, and lineage. ^[6]

Teams still need SQL and Python. They also need requirements work and tool judgment even when the stack uses newer lakehouse or AI-assisted components. ^[6] ELT is a durable workflow structure, not a fixed vendor list.

Schema, Quality, and Governance

ELT preserves source detail, but it also creates governance work. A Salesforce checkbox or picklist field can be ingested and modeled later without a full extraction redesign. ^[1] The same flexibility can create unused raw data, unclear ownership, and inconsistent definitions when teams don’t maintain the warehouse layers.

CDC is one ingestion technique that supports ELT. Change data capture syncs only changed records after an initial load. Those records include changed or deleted rows instead of a fresh copy of the whole source table. ^[1]

CDC keeps the loaded layer fresh. The team still has to decide how changes affect staged tables. It also has to decide how they affect modeled dimensions and downstream marts.

Quality checks belong both before and after loading. Ingestion tools may deduplicate, enforce ordering, and apply PII masking before data reaches Snowflake or another destination.^[3]

Teams separate ingestion hygiene from business transformation. Deduplication and ordering guarantees can happen near ingestion, and masking can happen there too. Business modeling happens later with warehouse entities and use cases. ^[3]

Warehouse-side dbt tests can query for nulls and duplicate records. They can also check ranges before dependent models build. The same test layer can catch bad source data. ^[5] Analytics engineering work also has to handle bad data, schema changes, and raw-input limits.^[5]

Platform teams may track data quality metrics, reconcile source counts against warehouse or lake targets, and use dynamic data masking with role-based access. They also maintain lineage when raw and modeled data changes. ^[2]

These controls put ELT close to DataOps because teams need versioned code, tests, lineage, and observability. They also need repeatable runs, not only a load-first diagram.

Operating the ELT Stack

ELT shifts some work from data engineers to analytics engineers and analysts. It doesn’t remove engineering work. Analytics teams gain autonomy because many transformations can be written in SQL after data is already in the warehouse. ^[1]

Daily analytics engineering work still requires data modeling and pipeline awareness. It also requires data quality work and Looker modeling. dbt tests and collaboration with backend and data engineering teams matter too. ^[5]

dbt influenced analytics engineering, but the role extends beyond tool work. Analytics engineers also need to understand data model architecture and business domains. They need to connect the models to KPIs. Table design and incrementalization choices matter too. ^[4]

Whether teams use dbt or a homegrown SQL runner, the same modeling questions still apply. Another warehouse modeling layer faces those questions too.

Career projects expose the same operating skills because pipelines may need Docker, Airflow, and AWS runs. They may also need warehouse-specific SQL, clean data, and quality checks. Reproducible work matters more than the ELT acronym when a pipeline has to be useful to business analysts.^[7]

Modern analytics teams load source detail and keep the warehouse flexible. They can transform with SQL and dbt, but they still need governance, cleanup, and data mart boundaries.^[1]

Teams make a load-first stack useful through daily analytics engineering work. They maintain models and tests, and they coordinate DAGs and collaboration. The acronym alone doesn’t make it useful. ^[5]

DataTalks.Club

ELT