Data Engineering Zoomcamp

A comprehensive free 9-week course covering the fundamentals of data engineering, from data ingestion to transformation and analytics. Build robust data pipelines, work with cloud platforms, and process big data efficiently.

Join the Course Watch on YouTube


How to Enroll

2025 Cohort

Self-Paced Learning

All course materials are freely available for independent study. Follow these steps:

  1. Watch the course videos
  2. Join the Slack community
  3. Refer to the FAQ document for guidance

Course Overview

This course teaches practical skills to become a data engineer. You’ll learn to build end-to-end data pipelines, work with cloud platforms, and handle big data processing using industry-standard tools and best practices.

Prerequisites

To get the most out of this course, you should have:

  • Basic coding experience
  • Familiarity with SQL
  • Experience with Python (helpful but not required)

No prior data engineering experience is necessary.

What You’ll Learn

  • Containerization: Docker and Docker Compose for consistent environments
  • Infrastructure as Code: Terraform for cloud resource management
  • Cloud Platforms: Hands-on experience with Google Cloud Platform
  • Data Ingestion: Building scalable data ingestion pipelines
  • Data Warehousing: Modern data warehouse design with BigQuery
  • Stream Processing: Real-time data processing with Apache Kafka
  • Big Data: Distributed processing with Apache Spark
  • Workflow Orchestration: Pipeline automation with Kestra
  • Analytics: SQL and data transformation techniques

Course Modules

The course is structured into hands-on modules, workshops, and a final project:

Module 1: Containerization and Infrastructure as Code

  • Topics: Introduction to GCP, Docker fundamentals, Infrastructure as Code
  • Tools: Docker, Docker Compose, Terraform, PostgreSQL
  • Project: Set up development environment and provision cloud infrastructure

Module 2: Workflow Orchestration

  • Topics: Data Lakes and Workflow Orchestration
  • Tools: Kestra for workflow orchestration
  • Project: Build automated workflow pipelines

Workshop 1: Data Ingestion

  • Topics: API reading, pipeline scalability, data normalization
  • Tools: DLT (Data Loading Tool)
  • Project: Build scalable data ingestion pipeline with incremental loading

Module 3: Data Warehousing

  • Topics: Data warehousing concepts, BigQuery optimization
  • Tools: BigQuery, Google Cloud Storage
  • Project: Design and implement a cloud data warehouse with partitioning and clustering

Module 4: Analytics Engineering

  • Topics: Data transformation, modeling, testing, documentation
  • Tools: dbt (data build tool), BigQuery, PostgreSQL, Metabase
  • Project: Transform raw data into analytics-ready models with visualization

Module 5: Batch Processing

  • Topics: Distributed computing, big data processing
  • Tools: Apache Spark, Google Dataproc
  • Project: Process large datasets with Spark DataFrames and SQL

Module 6: Streaming

  • Topics: Real-time data processing, stream analytics
  • Tools: Apache Kafka, Kafka Streams, KSQL, Avro
  • Project: Build real-time data streaming pipeline with schema management

Final Project

  • Topics: End-to-end project implementation
  • Goal: Apply all learned concepts in a comprehensive real-world scenario
  • Process: Peer review and feedback

Technologies We’ll Use

Core Tools

  • Docker - Containerization and environment consistency
  • Terraform - Infrastructure as Code for cloud resources
  • Google Cloud Platform - Cloud computing and managed services
  • PostgreSQL - Relational database for structured data
  • BigQuery - Cloud-native data warehouse

Data Processing

  • Apache Spark - Distributed big data processing
  • Apache Kafka - Real-time data streaming
  • Kestra - Workflow orchestration and pipeline automation
  • dbt - Data transformation and modeling
  • DLT - Data loading and ingestion

Development

  • Python - Primary programming language
  • SQL - Database querying and data manipulation
  • Jupyter - Interactive development and analysis
  • Git - Version control and collaboration

Community & Support

Getting Help

Join the #course-data-engineering channel on DataTalks.Club Slack for:

  • Course discussions and Q&A
  • Troubleshooting help
  • Networking with peers and instructors
  • Career advice and opportunities

Community Resources

Meet the Instructors

Current Instructors

Past Contributors

Course Format & Timeline

Time Commitment

  • Live Sessions: 2 hours/week
  • Homework: 3-4 hours/week
  • Total Duration: 9 weeks
  • Certificate: Available upon completion

Learning Format

  • Live Sessions: Weekly lectures with hands-on demonstrations
  • Hands-on Projects: Practical assignments to reinforce learning
  • Community Support: Active Slack community for questions and discussions
  • Homework: Weekly assignments to practice concepts
  • Capstone Project: End-to-end data engineering project

Sponsors & Supporters

We’re grateful to our sponsors who make this free course possible:

Course Sponsors

  • Kestra - Workflow Orchestration Platform
  • DLT Hub - Data Loading Tools

Interested in supporting our community? Reach out to alexey@datatalks.club.

Success Stories

This course has helped thousands of professionals transition into data engineering roles or advance their careers. Join our community to connect with alumni and current students who have successfully completed the program.

How to Get Started

Quick Start Guide

  1. Register: Sign up for the 2025 cohort or start self-paced learning
  2. Join Community: Connect with fellow students on Slack
  3. Set Up Environment: Follow the setup instructions in Module 1
  4. Start Learning: Begin with Module 1: Docker and Terraform

Course Resources


About DataTalks.Club

DataTalks.Club is a global online community of data enthusiasts. It’s a place to discuss data, learn, share knowledge, ask and answer questions, and support each other.

Connect With Us

Important: This course is completely free and designed by industry practitioners. All materials are open-source, and the community is here to support your learning journey.


Table of contents