Data Engineering Zoomcamp
A comprehensive free 9-week course covering the fundamentals of data engineering, from data ingestion to transformation and analytics. Build robust data pipelines, work with cloud platforms, and process big data efficiently.
Join the Course Watch on YouTube
How to Enroll
2025 Cohort
- Start Date: January 13, 2025
- Register Here: Sign up
- Access Cohort Materials: 2025 Cohort Folder
Self-Paced Learning
All course materials are freely available for independent study. Follow these steps:
- Watch the course videos
- Join the Slack community
- Refer to the FAQ document for guidance
Course Overview
This course teaches practical skills to become a data engineer. You’ll learn to build end-to-end data pipelines, work with cloud platforms, and handle big data processing using industry-standard tools and best practices.
Prerequisites
To get the most out of this course, you should have:
- Basic coding experience
- Familiarity with SQL
- Experience with Python (helpful but not required)
No prior data engineering experience is necessary.
What You’ll Learn
- Containerization: Docker and Docker Compose for consistent environments
- Infrastructure as Code: Terraform for cloud resource management
- Cloud Platforms: Hands-on experience with Google Cloud Platform
- Data Ingestion: Building scalable data ingestion pipelines
- Data Warehousing: Modern data warehouse design with BigQuery
- Stream Processing: Real-time data processing with Apache Kafka
- Big Data: Distributed processing with Apache Spark
- Workflow Orchestration: Pipeline automation with Kestra
- Analytics: SQL and data transformation techniques
Course Modules
The course is structured into hands-on modules, workshops, and a final project:
Module 1: Containerization and Infrastructure as Code
- Topics: Introduction to GCP, Docker fundamentals, Infrastructure as Code
- Tools: Docker, Docker Compose, Terraform, PostgreSQL
- Project: Set up development environment and provision cloud infrastructure
Module 2: Workflow Orchestration
- Topics: Data Lakes and Workflow Orchestration
- Tools: Kestra for workflow orchestration
- Project: Build automated workflow pipelines
Workshop 1: Data Ingestion
- Topics: API reading, pipeline scalability, data normalization
- Tools: DLT (Data Loading Tool)
- Project: Build scalable data ingestion pipeline with incremental loading
Module 3: Data Warehousing
- Topics: Data warehousing concepts, BigQuery optimization
- Tools: BigQuery, Google Cloud Storage
- Project: Design and implement a cloud data warehouse with partitioning and clustering
Module 4: Analytics Engineering
- Topics: Data transformation, modeling, testing, documentation
- Tools: dbt (data build tool), BigQuery, PostgreSQL, Metabase
- Project: Transform raw data into analytics-ready models with visualization
Module 5: Batch Processing
- Topics: Distributed computing, big data processing
- Tools: Apache Spark, Google Dataproc
- Project: Process large datasets with Spark DataFrames and SQL
Module 6: Streaming
- Topics: Real-time data processing, stream analytics
- Tools: Apache Kafka, Kafka Streams, KSQL, Avro
- Project: Build real-time data streaming pipeline with schema management
Final Project
- Topics: End-to-end project implementation
- Goal: Apply all learned concepts in a comprehensive real-world scenario
- Process: Peer review and feedback
Technologies We’ll Use
Core Tools
- Docker - Containerization and environment consistency
- Terraform - Infrastructure as Code for cloud resources
- Google Cloud Platform - Cloud computing and managed services
- PostgreSQL - Relational database for structured data
- BigQuery - Cloud-native data warehouse
Data Processing
- Apache Spark - Distributed big data processing
- Apache Kafka - Real-time data streaming
- Kestra - Workflow orchestration and pipeline automation
- dbt - Data transformation and modeling
- DLT - Data loading and ingestion
Development
- Python - Primary programming language
- SQL - Database querying and data manipulation
- Jupyter - Interactive development and analysis
- Git - Version control and collaboration
Community & Support
Getting Help
Join the #course-data-engineering
channel on DataTalks.Club Slack for:
- Course discussions and Q&A
- Troubleshooting help
- Networking with peers and instructors
- Career advice and opportunities
Community Resources
- Slack Community: Join DataTalks.Club Slack
- Telegram: Course Announcements
- FAQ: Comprehensive FAQ Document
- Guidelines: Asking Questions Guide
Meet the Instructors
Current Instructors
- Victoria Perez Mola - Lead Instructor
- Alexey Grigorev - Course Creator & Instructor
- Michael Shoemaker - Data Engineering Expert
- Zach Wilson - Analytics Engineering
- Will Russell - Cloud & Infrastructure
- Anna Geller - Workflow Orchestration
Past Contributors
- Ankush Khanna - Former Instructor
- Sejal Vaidya - Former Instructor
- Irem Erturk - Former Instructor
- Luis Oliveira - Former Instructor
Course Format & Timeline
Time Commitment
- Live Sessions: 2 hours/week
- Homework: 3-4 hours/week
- Total Duration: 9 weeks
- Certificate: Available upon completion
Learning Format
- Live Sessions: Weekly lectures with hands-on demonstrations
- Hands-on Projects: Practical assignments to reinforce learning
- Community Support: Active Slack community for questions and discussions
- Homework: Weekly assignments to practice concepts
- Capstone Project: End-to-end data engineering project
Sponsors & Supporters
We’re grateful to our sponsors who make this free course possible:
Course Sponsors
Interested in supporting our community? Reach out to alexey@datatalks.club.
Success Stories
This course has helped thousands of professionals transition into data engineering roles or advance their careers. Join our community to connect with alumni and current students who have successfully completed the program.
How to Get Started
Quick Start Guide
- Register: Sign up for the 2025 cohort or start self-paced learning
- Join Community: Connect with fellow students on Slack
- Set Up Environment: Follow the setup instructions in Module 1
- Start Learning: Begin with Module 1: Docker and Terraform
Course Resources
- Course Materials: All content is open-source and freely available
- Video Lectures: Complete playlist on YouTube
- Code Repository: Example code and solutions on GitHub
- Community: Active support on Slack and GitHub Discussions
About DataTalks.Club
DataTalks.Club is a global online community of data enthusiasts. It’s a place to discuss data, learn, share knowledge, ask and answer questions, and support each other.
Connect With Us
- Website: datatalks.club
- Slack: Join our community
- Newsletter: Subscribe for updates
- Events: Upcoming events
- YouTube: DataTalks.Club Channel
- GitHub: Open source projects
Important: This course is completely free and designed by industry practitioners. All materials are open-source, and the community is here to support your learning journey.