GitHub Repository
The GitHub repository is the most important resource for this course. Use it to navigate through all course materials.
https://github.com/DataTalksClub/data-engineering-zoomcamp

How to Use the Repository
- Start in the module folder you’re working on
- Read the README in that folder for an overview
- Follow the links to video lectures
- Complete homework assignments
- Check the cohort folder for any cohort-specific materials
The repository is your primary navigation tool. Each module README links directly to the relevant videos and resources you need.
Repository Structure
Each module has its own folder with everything you need:
01-docker-terraform/- Docker, PostgreSQL, Terraform, Google Cloud setup02-workflow-orchestration/- Workflow orchestration with Kestra03-data-warehouse/- Data warehousing with BigQuery04-analytics-engineering/- Analytics engineering with dbt05-batch/- Batch processing with PySpark06-streaming/- Stream processing with Kafka and Flinkprojects/- Final projects
Each module folder contains:
- Course notes and examples
- Homework assignments
- Links to relevant video lectures
- Code samples and notebooks
Cohorts
The cohorts/ folder contains materials specific to each edition of the course:
cohorts/2026/- 2026 cohort materialscohorts/2025/- 2025 cohort materialscohorts/2024/- 2024 cohort materialscohorts/2023/- 2023 cohort materialscohorts/2022/- 2022 cohort materials
2026 Cohort
The 2026 cohort folder contains:
01-docker-terraform/- Module 1 materials for 202602-workflow-orchestration/- Module 2 materials for 202603-data-warehouse/- Module 3 materials for 202604-analytics-engineering/- Module 4 materials for 202605-batch/- Module 5 materials for 202606-streaming/- Module 6 materials for 2026README.md- 2026 cohort informationproject.md- Final project guidelines for 2026