Wiki
dbt
How DataTalks.Club guests describe dbt as warehouse-side SQL transformation plus an engineering workflow for analytics models, tests, documentation, DAGs, and reviewed changes.
Related Wiki Pages
dbt is the archive’s clearest example of analytics engineering in practice. Teams write SQL models, run them in a data warehouse, and treat the transformation layer as reviewed, tested code. The important shift isn’t only from ETL to ELT. It’s from isolated SQL queries to a maintained project with dependencies and version control. The same project includes tests, documentation, and macros.
Victoria Perez Mola gives the most direct explanation in Analytics Engineer Skills and Tools. She describes dbt as the tool her team uses for modeling data after it arrives in Snowflake, alongside Looker and ingestion tooling. In her walkthrough, dbt keeps SQL files in a code workflow and manages model dependencies. It also builds a DAG, runs tests, and exposes documentation. That makes it a practical bridge between analytics work and software engineering habits.
Warehouse-Side Transformation
dbt belongs most naturally to warehouse-side transformation in the ETL vs ELT discussion. Natalie Kwong explains that move in ETL vs ELT and the Modern Data Stack. She contrasts transforming before load with loading first and transforming in analytical storage. That approach gives analysts and analytics engineers more room to adjust business logic after raw data is available.
dbt fits that ELT flow because it doesn’t replace the warehouse. It compiles and runs transformation logic against systems such as Snowflake, BigQuery, or Redshift. Kwong places dbt next to ingestion tools and data marts. She also places it next to warehouses, orchestration, and CDC.
Reverse data flows belong to the same stack. The same boundary appears in Modern Data Pipeline Architecture, where Santona Tuli separates dbt from ingestion-focused pipeline engines. In her Upsolver comparison, dbt can author SQL transformations. Another system still loads data, handles streaming or ordering concerns, and may provide the execution engine.
Engineering Workflow
Perez Mola centers dbt on workflow because SQL models live in files. Teams can review changes in Git and see how a model changed over time. dbt also resolves dependencies between models. It renders the DAG so teams can see what a change will affect before it reaches dashboards or downstream tables.
Nikola Maksimovic shows the implementation side in From Marketing to Analytics Engineering. His transition from marketing into BI and analytics engineering included a dbt migration, data modeling, Looker work, and product analytics. It also included A/B testing support. The dbt project wasn’t a side tool. It was where reusable business logic moved out of scattered reports and into modeled transformation layers.
Juan Manuel Perafan pushes the same point in Foundations of the Analytics Engineer Role. He treats dbt as one way to put analytics engineering into practice, not as the definition of the job. The craft is still translating business reality into clean data systems. dbt helps when the team needs those systems to be tested, reviewed, and repeatable.
Tests and Quality
dbt tests turn data quality checks into project code. Perez Mola describes standard checks such as non-null and uniqueness tests, plus custom SQL tests. She also explains that a dbt test is a query: if the query returns failing rows, dbt can warn or error. Her team checks sources before building dependent models so bad source data doesn’t silently flow into the modeled layer.
This is where dbt connects to data quality and observability. dbt can catch assumptions about required fields and duplicate identifiers. It can also check accepted ranges, valid city names, and relationships. It can’t prove that every business number is correct. Perez Mola is explicit that data quality is ongoing work and that teams add tests after they learn from mismatches.
Perafan broadens the testing conversation. He contrasts manual dashboard checking with automated tests for SQL logic and data assumptions. He describes generic tests, singular SQL tests, and unit-test style checks as ways to make analytics work safer. His argument lines up with CI/CD and DataOps: tests should run before bad changes reach consumers, not only after a stakeholder reports a broken metric.
Documentation and Lineage
dbt documentation is part of the workflow, not an afterthought. Perez Mola describes schema YAML files where teams document models, fields, tags, and custom metadata. dbt docs can show model code and generated documentation. They also show dependencies, so an analytics engineer can look at what depends on a table before changing it.
Perez Mola also marks a limit by distinguishing documentation from data profiling. dbt can document models and expose lineage, but it isn’t the main tool for deep profiling or full observability. She mentions profiling and observability tools such as Datafold and Monte Carlo as adjacent options. That places dbt inside the modern data stack rather than above it.
Macros and Reuse
Macros let teams reuse transformation logic instead of copying SQL across models. Perez Mola compares dbt macros to user-defined functions in SQL systems. Her example is practical: standardizing city names or similar repeated cleanup logic across tables.
Macros remove repeated transformation code, but they don’t eliminate the need for clear modeling. Maksimovic’s dbt migration story keeps the focus on table design and transformation layers. He also discusses the choice between wide and narrow models. In other words, dbt can package reusable logic, but the team still has to decide the model grain. It also has to decide business entities and metrics.
Orchestration Boundary
dbt builds a model DAG, but that doesn’t make it the whole orchestrator for a data platform. Perez Mola notes that dbt Cloud can schedule runs. Kwong places Airflow around the broader flow of ingestion and transformation.
Tuli brings the same boundary from her Airflow and pipeline background. Modern data pipelines still need orchestration, ingestion, and staging. They also need ordering guarantees and recovery outside the transformation project.
Use Apache Airflow for the orchestration-specific tool discussion. In a typical warehouse stack, Airflow or another orchestrator coordinates extract-load jobs and dbt runs. It also coordinates checks and downstream syncs. dbt owns the transformation graph and model tests.
The orchestrator owns when jobs run and how retries happen. It also owns how the end-to-end data pipeline recovers.
Guest Differences
Guests agree that dbt made SQL transformation more engineerable, but they don’t treat it as the whole discipline. Perez Mola links dbt closely to the rise of analytics engineering. She presents it as the everyday tool for modeling, tests, DAGs, and docs.
Maksimovic shows how learning dbt can anchor a career move from business or marketing work into analytics engineering. That path also needs SQL, BI, product context, and modeling practice. See Marketing to Analytics Engineering for that transition path.
Perafan is more careful about tool identity. dbt helps teams practice analytics engineering, but dbt alone doesn’t make someone an analytics engineer. Tuli separates dbt from ingestion and execution-engine concerns. Kwong situates it inside ELT and the modern stack.
Christopher Bergh connects dbt tests to a broader DataOps operating model in Mastering DataOps: tests should be automated and version-controlled. They should also stay close to the data and run during development.
Adrian Brudaru adds the 2025 tooling perspective in Modern Data Engineering Trends. He credits dbt with changing how people think about data engineering by reducing boilerplate and improving project quality, while also noting alternatives such as SQLMesh. His broader advice is requirements-led tool selection. dbt is influential, but it’s one choice in a changing ecosystem of warehouses and lakehouse formats. Teams also choose catalogs, orchestrators, and local-first tools.
Practical Takeaways
Use dbt when the main problem is maintaining warehouse-side SQL models with clear dependencies and repeatable runs. It’s a strong fit for analytics engineering teams that need reviews, tests, and documentation. Those teams also need trusted marts, shared metrics, and BI-ready tables. dbt helps transformation logic survive beyond one analyst’s query.
Don’t expect dbt to solve every data-platform problem. Teams still need separate design choices for ingestion, orchestration, and deep profiling. They also need observability, streaming, source ownership, and warehouse cost control. The archive’s strongest dbt episodes all make the same distinction: dbt improves the transformation layer. Reliable analytics still depends on the surrounding data platform and team practices.