Wiki

DataOps

Podcast-grounded reference page for DataOps as the operating discipline for reliable data pipelines, analytics workflows, and data platforms.

DataOps is the operating discipline for data pipelines, analytics workflows, and data platforms. It makes data delivery reliable enough to change. Teams use version control and tests to review changes. They use CI/CD and orchestration to release them. They use observability, reproducibility, deployment checks, and recovery paths when a pipeline has to be rerun or repaired.

In Mastering DataOps, Christopher Bergh ties the practice to error reduction, shorter deployment cycles, and team productivity around 6:42. He connects observability to production errors around 7:22, then connects version control and tests to healthier pipelines around 33:47. Around 34:37, he adds CI/CD and runbook automation.

In DataOps 101, Lars Albertsson applies the same operating idea to scalable data platforms. He focuses on immutable pipelines, reproducibility, and self-service. He also covers workflow engines and quality automation.

Use MLOps for production machine learning systems. Use MLOps vs DataOps when the question is about the boundary between model lifecycle work and data delivery work. Use DataOps Platforms when the question is how DataOps becomes shared platform infrastructure.

Core Interviews

The strongest DataOps interviews in the archive are DataOps 101, Mastering DataOps, DataOps for Data Engineering, and DataOps and GitOps for Data Teams. They cover the operating model from two sides: platform architecture and team delivery practice.

Adjacent interviews fill in the systems DataOps has to operate. In Data Engineering Tools and Modern Data Stack, Natalie Kwong explains ELT and dbt-style transformations. She also covers orchestration, CDC, schema evolution, and warehouses. She covers lake architecture too. In Data Observability Explained, Barr Moses defines the monitoring signals DataOps teams use when pipelines silently produce bad data.

Common Definition

Across the archive, DataOps means making data delivery repeatable and recoverable. Bergh’s two DataOps interviews define that through delivery practice. Mastering DataOps connects DataOps to error reduction, deployment cycle time, and productivity. It also connects DataOps to monitoring, tests, CI/CD, and automated playbooks.

DataOps for Data Engineering adds regression tests and realistic test data around 30:55. Bergh also covers deployment automation at 42:39, production monitoring at 50:29, and on-call readiness at 26:13.

Data work fails in ways application uptime checks may miss. Sources and schemas change. Files arrive late. Transformations break. Dashboards or models may consume the result before the producing team notices.

Moses describes this failure mode in Data Observability Explained. Data teams may have “good pipelines” that still deliver bad data. They need freshness, volume, and distribution signals. They also need schema, lineage, and ownership. SLAs and runbooks make those signals actionable.

Platform and Delivery Practice

DataOps can start from platform design or from day-to-day delivery practice. Albertsson starts from the platform side in DataOps 101. He discusses workflow engines around 10:48, immutable and functional pipeline architecture around 16:42, and reproducibility problems around 20:12. Around 30:34, he breaks the platform into storage, compute, and workflow engines. He then covers batch-versus-streaming tradeoffs around 41:53 and quality or schema automation around 46:52.

In that framing, teams need self-service data access without losing reproducibility, ownership, or quality.

Bergh starts from team delivery. In Mastering DataOps, he emphasizes version control, tests, and CI/CD. He also discusses observability, automated playbooks, and replaceability. He connects those practices to handoffs and end-to-end versioning.

A large data platform may begin with shared infrastructure and conventions. A smaller data team may begin with Git and tests. It may then add a scheduler, basic monitors, and a recovery runbook. Both paths are DataOps when they make data changes reviewable, testable, observable, and recoverable.

Pipeline Scope

DataOps applies to ingestion jobs and transformations. It also applies to datasets, dashboards, data products, and analytics workflows. Kwong’s Data Engineering Tools and Modern Data Stack shows the range DataOps has to cover. The modern stack spreads responsibility across ELT, raw ingestion, and warehouse transformations. It also includes dbt and orchestration.

It also includes CDC, schema evolution, and warehouse analytics. DataOps is the operating discipline that keeps that path inspectable and repairable, not a substitute name for any one tool.

This is where Orchestration, CI/CD, and Data Engineering Platforms connect to DataOps. Another person should be able to review a data change, test it with realistic data, and rerun it after failure. They should also be able to observe its outputs and fix it without reverse-engineering the whole pipeline.

GitOps and Team Enablement

Tomasz Hinc adds the infrastructure side of DataOps in DataOps and GitOps for Data Teams. He describes DataOps as making data work faster and less scary around 18:59, then shows how infrastructure practice supports that goal. Around 20:56 through 26:21, he discusses SQL, secrets, and GitOps. He also covers Infrastructure as Code.

He then connects that workflow to Terraform, Terragrunt, and Atlantis. Teams use merge requests, dry runs, and applies to make infrastructure changes reviewable.

That episode keeps DataOps connected to enablement rather than management. Hinc discusses platform teams reviewing changes at 13:07 and onboarding friction for data scientists at 27:34. He also covers proactive support, cross-team education, and minimal operational skills. Those operational skills include Git, command line use, IAM, and password managers.

Operational practice has to fit the people changing the pipelines. It can’t fit only the infrastructure team that owns the platform.

Observability and Recovery

DataOps depends on observability, but it’s broader than observability. Moses defines data observability through freshness, volume, and distribution at 16:38 in Data Observability Explained. She includes schema and lineage in the same framework. She then connects those signals to root-cause analysis, ownership, and data SLAs. Later sections cover threshold automation, operational runbooks, and false-positive reduction.

DataOps uses those signals inside the larger delivery system. Bergh’s DataOps episodes connect monitoring to CI/CD, regression tests, and deployment confidence. They also connect monitoring to automated playbooks and on-call readiness.

Use Data Quality and Observability and Data Observability for the monitoring layer. Use this DataOps page for the operating model around tests and releases. It also covers alerts, recovery steps, and ownership.

Relation to MLOps

DataOps and MLOps overlap because production ML depends on production data, but they operate different objects. DataOps teams operate upstream pipelines, datasets, transformations, and features. They also handle metadata, quality checks, and recovery paths. MLOps teams add model artifacts, training jobs, and inference paths. They also handle model registries, retraining decisions, and model behavior.

Bergh makes the shared DevOps inheritance explicit in Mastering DataOps around 50:42, while Albertsson separates shared principles from ML-specific requirements in DataOps 101 around 53:31.

The boundary matters during incidents. In MLOps Architect Guide, Danny Leybzon connects model monitoring to ETL, data pipelines, and upstream root causes around 27:35.

A model alert may come from model drift. It may also come from a late table, a changed schema, a broken feature pipeline, or a missing label. DataOps teams handle the data delivery investigation. MLOps teams handle the model lifecycle investigation.

Use these pages for adjacent DataOps topics and the MLOps boundary: