Wiki
DataOps Platforms
How DataTalks.Club podcast guests discuss DataOps platforms as the operating layer for reliable pipelines, CI/CD, observability, governance, and self-service data delivery.
Related Wiki Pages
DataOps platforms combine shared data infrastructure with the practices that make pipelines reliable. Guests in the DataTalks.Club archive don’t define a DataOps platform as one vendor product. They describe a shared layer where data pipelines and orchestration run under tests and CI/CD. Lineage, ownership, and observability make that layer operable.
Lars Albertsson gives the clearest platform framing in DataOps 101 for Scaling Data Platforms. At 30:34, he names storage, compute, and workflow engines as core platform components. At 16:42-20:12, he explains why immutable pipelines and reproducibility matter. At 46:52 and 1:04:18, he connects quality automation with lineage and versioning.
Use DataOps for the operating discipline and Data Engineering Platforms for the broader platform concept. Use this page for the overlap: shared platform capabilities that make DataOps usable across teams.
Link Map
Start with these adjacent pages:
- DataOps
- Data Engineering Platforms
- Self-Service Data Platforms
- DataOps Tools
- Data Quality and Observability
- Data Observability
- Orchestration
- CI/CD
- Data Governance
- Modern Data Stack
- MLOps vs DataOps
The main podcast anchors are:
- DataOps 101 for Scaling Data Platforms with Lars Albertsson for immutable pipelines and workflow engines, plus self-service, quality automation, and lineage.
- Mastering DataOps and DataOps for Data Engineering with Christopher Bergh for version control, tests, and CI/CD. The episodes also cover observability, runbooks, and deployment automation.
- DataOps and GitOps for Data Teams with Tomasz Hinc for infrastructure as code, secrets, onboarding, and reproducible dependencies.
- Scaling Data Engineering Teams and Self-Service Platforms with Mehdi OUAZZA for self-service enablement and Airflow conventions, plus Kafka schemas and data contracts.
- Data Observability Explained with Barr Moses for freshness and volume, with schema and lineage as the next pillars. The episode also covers SLAs, runbooks, and false-positive reduction.
Common Definition
A DataOps platform gives teams a shared way to review and test data changes. It also helps teams deploy, observe, and repair those changes. The platform may use a warehouse or lakehouse, with an orchestrator and CI/CD system beside it. Test suites and catalogs often sit in the same layer. Observability and access control belong there too.
The tool list isn’t the useful part. The team can change data work without relying on one person’s memory.
Albertsson frames that capability from architecture. In DataOps 101 for Scaling Data Platforms, he moves from immutable pipeline architecture at 16:42 to reproducibility problems at 20:12. At 30:34, he separates storage, compute, and workflow engines. At 46:52, he adds test-certified practices and schema automation. Those practices turn DataOps from process advice into reusable infrastructure.
Bergh frames the same platform from delivery practice. In Mastering DataOps, he names error reduction and deployment cycle time at 6:42. Team productivity is part of the same goal. At 33:47, he connects version control and tests with CI/CD. At 34:37, he moves runbooks toward automated playbooks.
In DataOps for Data Engineering, he adds regression tests and realistic test data at 30:55. At 42:39, he connects deployment automation to version control and tests.
The shared definition is practical. A DataOps platform helps teams ship and repair data changes. It isn’t useful when it only adds another console beside the warehouse, scheduler, and dashboard tools.
Guest Differences
Guests start from different failure modes. Albertsson starts with platform primitives and reproducible data flows, so his discussion covers storage and compute. Workflow engines and lineage also belong there. He uses the same frame for immutable pipelines and the batch-versus-streaming boundary (DataOps 101, 16:42-1:04:18).
Bergh starts with the cost of fragile delivery. His platform needs version control and tests, while CI/CD and observability become the next layer. Teams also need automated playbooks because manual checks don’t scale (Mastering DataOps, 6:42-38:01).
Hinc starts from infrastructure enablement. In DataOps and GitOps for Data Teams, he contrasts asking a platform team for infrastructure with making a merge request at 12:40. At 23:04-26:21, he describes Terraform and Terragrunt. He also describes Atlantis and the branch-to-apply flow. Reviewable infrastructure becomes part of the DataOps platform.
OUAZZA starts from scale-up pressure. In Scaling Data Engineering Teams and Self-Service Platforms, he frames the data platform role around self-service and onboarding at 12:30. At 17:22, he describes Airflow conventions and playbooks. At 23:26, he adds Kafka schemas and schema registries. Data contracts make the interface explicit.
His platform helps more teams build data work without losing standards.
Pipeline and Platform Capabilities
A DataOps platform has to cover the path from source change to trusted output. Natalie Kwong maps that path from the modern data stack side in ETL vs ELT and Modern Data Engineering. She covers raw ingestion and guardrails at 17:55. She then covers warehouse transformations at 18:47 and orchestration at 30:59.
At 31:31, she connects Airbyte with dbt. She covers CDC and schema evolution at 45:59 and 48:58.
The platform needs more than scheduling. It needs versioned pipeline code and repeatable transforms, plus dependency management and schema checks. It also needs a way to handle changed source data. Use ETL, ELT, and ETL vs ELT when the main question is where transformation happens.
Storage and compute also belong in the platform. Albertsson discusses raw data lakes and warehouses at 21:29-30:34 in DataOps 101. Object storage, governance, and self-service SQL sit in that discussion too.
Adrian Brudaru updates that layer in Modern Data Engineering Trends. At 18:17-23:41, he explains Iceberg plus the storage and compute split. Access and metadata sit in the same layer, and lineage belongs there too. At 35:37, he compares Airflow and Prefect. He also discusses Dagster and GitHub Actions.
These interviews show a platform boundary. The platform coordinates storage and compute, along with workflow, metadata, and quality. It shouldn’t force every team into a heavy stack when a smaller batch system or managed service fits the use case.
CI/CD and Release Paths
DataOps platforms make data changes reviewable before they affect consumers. That starts with Git, but it has to extend into deployment. Bergh’s newer DataOps for Data Engineering episode connects CI/CD pipelines, regression tests, and test data at 30:55. At 42:39, he ties deployment automation to version control and tests. At 54:05, he returns to immutability and versioning.
Hinc adds the infrastructure version. In DataOps and GitOps for Data Teams, he discusses SQL and secrets at 20:56. GitOps and developer enablement sit in the same section.
At 23:42, he defines infrastructure as code through declarative configuration and reproducibility. At 26:21, he describes a branch and merge request. The Atlantis dry run and apply flow complete that release path.
This makes CI/CD part of the platform, not a separate concern for infrastructure engineers. Data teams need release paths for SQL models and orchestrator definitions. They also need release paths for access rules, infrastructure, and governance changes. Otherwise the warehouse may be modern while the operating model still depends on manual coordination.
Observability and Recovery
A DataOps platform must tell teams when the data is wrong, not only when the job failed. Barr Moses makes that distinction in Data Observability Explained. At 13:40, she describes silent failures. At 16:38, she names freshness and volume as observability pillars. Distribution, schema, and lineage belong in the same framework.
At 24:31-26:04, she separates detection from diagnosis. Monitoring and correlation support that diagnosis, while logs and lineage help the team find the source of the failure.
The operating side matters too. Moses covers ownership and communication at 29:00, data SLAs at 35:24, and runbooks at 41:03. She covers platform integration at 47:00 and auto-lineage at 58:51. At 1:00:27, she returns to false-positive reduction. Those are platform requirements because alerts without ownership and runbooks don’t produce reliable recovery.
Bergh ties observability back to the delivery loop. In DataOps for Data Engineering, he recommends starting from production monitoring at 50:29 because real failures show which process gaps matter. DataOps platforms should connect tests and monitors, while lineage, owners, and recovery paths complete the loop.
Self-Service and Governance
Self-service is useful only when the platform path is safe. OUAZZA describes the data platform role as enablement for self-service and onboarding at 12:30 in Scaling Data Engineering Teams and Self-Service Platforms. Scalability is part of the same argument. At 17:22, he explains why an Airflow cluster alone isn’t a platform. Teams also need conventions and playbooks.
At 23:26, Kafka schemas and data contracts make streaming interfaces explicit.
Rahul Jain adds the leadership and governance lens in Data Engineering Leadership and Modern Data Platforms. At 25:04, he discusses data culture and data quality metrics. At 28:04-30:50, he covers reconciliation, GDPR strategies, and dynamic masking. He also covers role-based access control, ELT modernization, data lakes, and lineage.
That’s why DataOps platforms sit between Self-Service Data Platforms and Data Governance. The platform should make routine work easier while preserving ownership, privacy, quality checks, and lineage.
Tool Stack or Platform
The archive doesn’t require a dedicated DataOps vendor before a team can practice DataOps. Bergh mentions dbt tests, Great Expectations, and SQL tests at 48:25 in Mastering DataOps. At 56:32, he describes a platform that orchestrates environments, tests, and observability. That supports two paths: a team may assemble a stack from existing tools, or it may adopt a platform that integrates them.
The decision depends on coordination cost. A small team can start with Git and CI, then add dbt tests, a scheduler, and a basic runbook. A larger platform team may need standard templates and environment orchestration. It may also need centralized observability, lineage, and access workflows.
DataOps Tools covers tool categories. Use this page for the platform structure that supports many pipelines, users, and repeated changes.
Related Pages
Use these pages for adjacent platform, delivery, and governance topics: