
Complete Data Engineering Zoomcamp curriculum: from infrastructure setup to stream processing
Want to become a data engineer or enhance your data engineering skills?
Data Engineering Zoomcamp by DataTalks.Club is a free, comprehensive 9-week course that takes you from fundamentals to production-ready data pipelines. Whether you’re a beginner or an experienced developer, this bootcamp will help you master the essential tools and practices used by professional data engineers.
What You’ll Learn in This Guide
- What makes Data Engineering Zoomcamp different
- Course curriculum
- Course assignments and scoring
- Building Your Portfolio
- Community and Resources
- Quick Start Guide
- Frequently Asked Questions
What Makes Data Engineering Zoomcamp Different
- Free and Comprehensive: Complete data engineering curriculum without cost barriers
- Practical Focus: Learn by building real data pipelines and systems
- Industry-Standard Tools: Master tools like Python, SQL, dbt, Kafka, and Spark
- Certification Path: Earn a certificate by completing projects and peer reviews
- Active Community: Join DataTalks.Club’s vibrant community of data professionals
Read more about all free courses at DataTalks.Club.
Who the Course is For and Prerequisites
- Skilled in coding
- Comfortable with the command line
- Basic SQL
We’ll mainly use Python, but if you’re not familiar with it yet, it’s not a problem. If you’re already programming in another language, you’ll have no trouble picking up Python here.
Course Curriculum
The course curriculum is structured to guide you through the essential elements of data engineering, beginning with foundational concepts and advancing to complex topics.

Course overview: A complete journey through modern data engineering tools and technologies
Course Structure
The curriculum follows a logical progression from infrastructure setup to advanced data processing, culminating in an end-to-end project. Here’s what you’ll learn each week:
Core Technologies (Weeks 1-6)
Week 1: Infrastructure & Prerequisites
- Set up development environment with Docker and PostgreSQL
- Learn cloud basics with GCP and infrastructure-as-code using Terraform
- Hands-on practice with containerization and cloud resource management
Week 2: Workflow Orchestration
- Master data pipeline orchestration with Mage.AI
- Implement and manage Data Lakes using Google Cloud Storage
- Build automated, reproducible data workflows
Week 3: Data Warehouse
- Deep dive into BigQuery for enterprise data warehousing
- Learn optimization techniques: partitioning and clustering
- Implement best practices for data storage and retrieval
Week 4: Analytics Engineering
- Transform raw data into analytics-ready models using dbt
- Develop testing and documentation strategies
- Create impactful visualizations with modern BI tools
Week 5: Batch Processing
- Process large-scale data with Apache Spark
- Master Spark SQL and DataFrame operations
- Optimize batch processing workflows
Week 6: Stream Processing
- Build real-time data pipelines with Kafka
- Develop streaming applications using KSQL and Faust
- Implement stream processing patterns and best practices
Project Phase (Weeks 7-9)
The final three weeks are dedicated to applying your knowledge in a real-world project:
- Project Requirements:
- Select and process a dataset that interests you
- Build end-to-end data pipelines (batch or streaming)
- Implement both data lake and warehouse solutions
- Create analytical dashboards
- Deliverables:
- Production-ready data pipeline
- Documented data models
- Interactive dashboard
- Project presentation
- Evaluation:
- Peer review of at least three other projects
- Technical implementation quality
- Documentation completeness
- Solution architecture design
Course Assignments and Scoring
Homework and Getting Feedback
To reinforce your learning, you can submit a homework assignment at the end of each week. It’s reviewed and scored by course instructors. Your scores are added to an anonymous leaderboard, creating friendly competition among course members and motivating you to do your best.

Course leaderboard displaying student progress and achievements anonymously
For support, we have an FAQ section with quick answers to common questions. If you need more help, our Slack community is always available for technical questions, clarifications, or guidance. Additionally, we host live Q&A sessions called “office hours” where you can interact with instructors and get immediate answers to your questions.

FAQ section providing quick answers to common course-related questions
Learning in Public
A unique feature is our “learning in public” approach, inspired by Shawn @swyx Wang’s article. We believe that everyone has something valuable to contribute, regardless of their expertise level.

An extract from Shawn @swyx Wang's article about learning in public
Throughout the course, we actively encourage and incentivize learning in public. By sharing your progress, insights, and projects online, you earn additional points for your homework and projects.

Previous cohort's leaderboard highlighting bonus points earned through learning in public activities
This not only demonstrates your knowledge but also builds a portfolio of valuable content. Sharing your work online also helps you get noticed by social media algorithms, reaching a broader audience and creating opportunities to connect with individuals and organizations you may not have encountered otherwise.
Building Your Portfolio with Data Engineering Projects
If you’ve participated in data engineering interviews or researched the field, you know the importance of having real-world projects in your portfolio. This is especially crucial if you’re transitioning into data engineering or seeking your first role in the field.
In the final three weeks of the course, you’ll build an end-to-end data pipeline project that showcases everything you’ve learned. Your project will include:
- Selecting and processing a dataset of your choice
- Building data pipelines (batch or streaming)
- Setting up a data warehouse
- Creating analytical dashboards
- Documenting your solution
You’ll use modern tools like:
- Cloud platforms (GCP, AWS, or Azure)
- Infrastructure as Code (Terraform)
- Data processing frameworks (Spark, Kafka)
- Analytics tools (dbt, BigQuery)
Your project will be peer-reviewed by other participants, giving you valuable feedback and the opportunity to learn from others’ solutions.
Want to make your project stand out? Consider adding extra features like automated testing, CI/CD pipelines, or advanced visualizations. These additions can make your portfolio more impressive to potential employers.
DataTalks.Club Community
DataTalks.Club has a supportive community of like-minded individuals in our Slack. It is the perfect place to enhance your skills, deepen your knowledge, and connect with peers who share your passion. These connections can lead to lasting friendships, potential collaborations in future projects, and exciting career prospects.

Active discussions and peer support in our dedicated Slack community channel
Quick Start Guide for Data Engineering Zoomcamp
Data Engineering Zoomcamp offers a practical path to becoming a data engineer. In just 9 weeks, you’ll gain:
- Hands-on experience with modern data engineering stack (Docker, Terraform, dbt, Spark, Kafka)
- Build production-ready data pipelines from ingestion to visualization
- Join a vibrant community of data professionals and learners
The next cohort starts in January 2026! Take the first step toward your data engineering career. Register for the course and start your learning journey today!
Frequently Asked Questions
When does the next cohort start?
The next cohort starts in January 2026. Register here: https://airtable.com/appzbS8Pkg9PL254a/shr6oVXeQvSI5HuWD before the course starts.What are the prerequisites?
To get the most out of this course, you should have basic coding experience and familiarity with SQL. Python experience is helpful but not required. No prior data engineering experience is needed.How much time should I expect to spend?
You should expect to spend between 5-15 hours per week, depending on your background. This includes watching videos, completing homework, and working on projects. More time might be needed during project weeks. The time commitment varies based on your familiarity with the tools and concepts.Can I take the course in self-paced mode?
Yes! All course materials remain available after the course ends. You'll have access to the Slack community for support, where you can search previous discussions or ask @ZoomcampQABot for help. You can continue working on homework and projects at your own pace, and all course materials and recordings stay accessible.Where can I find the course videos?
Our videos are available in several playlists, with the "Data Engineering Zoomcamp" playlist serving as the primary reference. We also maintain year-specific playlists for office hours and updates.What are the certification requirements?
To earn your Data Engineering certification, you need to complete the project requirements by building an end-to-end data pipeline. You must also participate in peer learning by reviewing at least 3 other projects, submitting reviews by the deadline, and providing constructive feedback.Quick Links
Ready to begin your data engineering journey? Here’s everything you need: