Breaking into data engineering takes real, hands-on experience with production tools, but most courses stop at theory.
The Data Engineering Zoomcamp changes that. It’s a free data engineering course that teaches you how to build production-grade data pipelines from start to finish. You’ll work with Docker, Terraform, BigQuery, dbt, Spark, and Kafka, and graduate with a portfolio project and a certificate.
Complete Data Engineering Zoomcamp curriculum: from infrastructure setup to stream processing
It’s ideal for beginners and career switchers preparing for junior data engineer roles, as well as experienced professionals who want to refresh their knowledge, expand their network, or test themselves as a mentor for less experienced professionals.
Unlike most courses, DE Zoomcamp helps you build your public portfolio, share your work confidently, and connect with a global community for feedback, mentorship, and career opportunities.
TL;DR: Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here.
What is the Data Engineering Zoomcamp?
Data Engineering Zoomcamp is a 9-week program that follows a clear progression: infrastructure setup, workflow orchestration, data warehousing, analytics engineering, batch processing, streaming, and a final capstone project.
Data Engineering Zoomcamp GitHub repository showing the course materials
What makes it different is the community. You’ll join an active Slack workspace where thousands of learners troubleshoot together, share progress, and connect for jobs and collaborations. The course encourages learning in public: sharing your work earns bonus points and builds your online presence.
The final three weeks focus on your capstone that gets peer-reviewed, and you graduate with a polished GitHub portfolio that proves you can ship real data systems.
Why Learn Data Engineering?
Typically, data science teams are comprised of data analysts, data scientists, and data engineers. Among different data roles, data engineers are the guys who connect all the pieces of the data ecosystem within a company or institution.
But why would you want to become one? Here are some of the main reasons that make data engineering roles satisfactory and valuable:
- You become the builder behind every data product that keeps information flowing.
- You increase your earning potential by joining a smaller, high-value pool of professionals.
- You develop transferable skills that are valuable across industries and provide long-term career flexibility. These skills are also foundational for roles in machine learning and MLOps.
- You future-proof your career by building a mindset that will stay essential even as tools change and processes get automated.
Who Can Learn Data Engineering (and What Roles It Leads To)
The great thing about data engineering is that you don’t need to be a computer science graduate or have years of experience in data to start.
If you understand basic programming and have curiosity about how data systems work, you can learn data engineering.
If you tick most of the following boxes, data engineering might be a good fit for you:
- Enjoy solving technical, logical problems and building things that work reliably.
- Like Python, SQL, or scripting and want to use those skills for something impactful.
- Want to understand how data moves inside organizations — from raw sources to analysis and AI (including LLM applications).
- Prefer practical, system-level work over purely theoretical or statistical modeling.
- Appreciate clear structure and step-by-step learning (which is how DE Zoomcamp is built).
Course Prerequisites
As we promised, learning data engineering doesn’t require prior data engineering experience.
The course is designed to be accessible to beginners.
The only requirements are:
- Comfort with the command line (basic navigation and file operations)
- Basic SQL knowledge (SELECT, WHERE, JOIN statements)
- Python experience is helpful but not required
If you’re completely new to programming, consider spending a few weeks learning the basics before starting the course.
Course Curriculum: What You’ll Learn in the Data Engineering Zoomcamp
The course follows a logical progression from infrastructure setup to advanced data processing, culminating in an end-to-end project.
| Module | What You'll Learn | Tools & Technologies |
|---|---|---|
| 1. Infrastructure & Prerequisites | • Set up your development environment with Docker and PostgreSQL • Learn cloud basics with GCP • Master infrastructure-as-code using Terraform |
Docker, PostgreSQL, GCP, Terraform |
| 2. Workflow Orchestration | • Master data pipeline orchestration with Mage.AI • Implement and manage Data Lakes using Google Cloud Storage • Build automated, reproducible workflows |
Mage.AI, Google Cloud Storage |
| 3. Data Warehouse | • Deep dive into BigQuery for enterprise data warehousing • Learn optimization techniques like partitioning and clustering • Implement best practices for data storage and retrieval |
BigQuery |
| 4. Analytics Engineering | • Transform raw data into analytics-ready models using dbt • Develop testing and documentation strategies • Create impactful visualizations with modern BI tools |
dbt, BI tools |
| 5. Batch Processing | • Process large-scale data with Apache Spark • Master Spark SQL and DataFrame operations • Optimize batch processing workflows |
Apache Spark, Spark SQL |
| 6. Stream Processing | • Build real-time data pipelines with Kafka • Develop streaming applications using KSQL and Faust • Implement stream processing patterns |
Kafka, KSQL, Faust |
| Final Project | • Build an end-to-end data pipeline from ingestion to visualization • Apply all learned concepts in a real-world project • Create a portfolio-ready project with documentation |
Cloud platforms (GCP/AWS/Azure), Terraform, Spark, Kafka, dbt, BigQuery |
Capstone Project
The final three weeks are dedicated to applying your knowledge in a real-world project that showcases everything you’ve learned throughout the course.
Data Engineering Zoomcamp capstone project of one of the course graduates, Maddie Zheng, showing project architecture: extract, load, transform, and visualize data. Source: Maddie's project
You’ll build an end-to-end data pipeline using a dataset of your choice, implementing both data lake and warehouse solutions with proper documentation.
Your project will be peer-reviewed by fellow participants and you’ll peer-review at least three other projects.
| Project Requirements | Deliverables | Evaluation Criteria |
|---|---|---|
|
• Select and process a dataset that interests you • Build end-to-end data pipelines (batch or streaming) • Implement both data lake and warehouse solutions • Create analytical dashboards |
• Production-ready data pipeline • Documented data models • Interactive dashboard • Project presentation |
• Peer review of at least three other projects • Technical implementation quality • Documentation completeness • Solution architecture design |
How the Data Engineering Zoomcamp Works
The course runs for 9 weeks in cohort format, providing structure and community support throughout your learning journey.
While you can access all course materials at your own pace without joining a cohort, participating in the structured program offers significant advantages: graded homework assignments, project submission and evaluation, peer interaction, and the opportunity to earn a certificate.
Below, we list the key features of the course and how they work.
GitHub Repository: The Central Hub of the Course
Data Engineering Zoomcamp GitHub repository showing the course materials
All course materials live in the GitHub repository. The lectures are pre-recorded and available on YouTube, so you can watch at your own pace.
Data Engineering Zoomcamp Youtube playlist with pre-recorded lectures
Homework Assignments
To reinforce your learning, we release homework assignemnts for each week of the course. You can submit a homework assignment at the end of each week.
Data Engineering Zoomcamp schedule showing the course schedule and submission deadlines
It doesn’t count toward your certificate, but it helps you practice and appears on an optional anonymous leaderboard.
Your scores are added to an anonymous leaderboard, creating friendly competition among course members and motivating you to do your best.
Course leaderboard displaying student progress and achievements anonymously
You can earn bonus points by learning in public — sharing your work on blogs, YouTube, or social media. And our next section is all about it.
Learning in Public: Build Your Online Presence
A unique feature is our “learning in public” approach, inspired by Shawn @swyx Wang’s article. We believe that everyone has something valuable to contribute, regardless of their expertise level.
An extract from Shawn @swyx Wang's article about learning in public
Throughout the course, we actively encourage and incentivize learning in public. By sharing your progress, insights, and projects online, you earn additional points for your homework and projects.
Previous cohort's leaderboard highlighting bonus points earned through learning in public activities
This not only demonstrates your knowledge but also builds a portfolio of valuable content. Sharing your work online also helps you get noticed by social media algorithms, reaching a broader audience and creating opportunities to connect with individuals and organizations you may not have encountered otherwise.
Many of our graduates have shared that their social media presence has helped them attract job offers and collaborations.
How to Get a Certificate
Data Engineering Zoomcamp certificate showing the certificate requirements
To earn your certificate, you must complete the course with a live cohort and fulfill three key requirements:
- Build a capstone project: Create an end-to-end data pipeline that demonstrates your mastery of the course concepts
- Submit on time: Meet the project submission deadline to qualify for certification
- Peer review: Evaluate and provide feedback on 3 fellow students’ projects during the peer review process
What is DataTalks.Club Community? A Place to Connect and Learn with Other Data Professionals
Active discussions and peer support in our dedicated Slack community channel
DataTalks.Club is a global community of 80,000+ data professionals who connect on Slack to share knowledge, ask career questions, and discuss everything from analytics and visualization to machine learning and data engineering. As one of the largest digital groups dedicated to data, it’s where you’ll find data scientists, ML engineers, data analysts, and enthusiasts at all career stages.
When you join a cohort, the dedicated course channel becomes your home base. Here, you’ll troubleshoot problems with peers working through the same challenges and share your progress and insights.
DataTalks.Club FAQ repository showing common questions and technical issues
Beyond peer support, there are two ways to get help: our FAQ repository covers common questions and technical issues and the @ZoomcampQABot in Slack provides quick answers when you need them.
DataTalks.Club Zoomcamp QABot in Slack providing quick answers when you need them
Zoomcamp vs. Bootcamp: What’s the Difference?
We often get asked what the difference is between the Data Engineering Zoomcamp and paid bootcamps.
Below, we list the key features of the Data Engineering Zoomcamp and how they compare to paid bootcamps.
| Feature | DE Zoomcamp (Cohort) | DE Zoomcamp (Self-paced) | Paid Bootcamps |
|---|---|---|---|
| Cost | Free | Free | $2,000–$10,000+ |
| Format | 9-week cohort with fixed schedule | Learn anytime at your own pace | Fixed schedule, instructor-led |
| Homework | Weekly scored assignments | Available but no scoring | Weekly with instructor feedback |
| Projects | Capstone project with peer review and scoring | Build on your own, no evaluation and scoring | Instructor-reviewed projects |
| Certificate | Yes, after completing project + peer reviews | No certificate | Certificate of completion |
| Community Support | Active Slack + optional live Q&A sessions | Slack community only | Instructor-led, 1:1 or group mentorship |
| Learning in Public | Encouraged with bonus points | Optional | Rarely emphasized |
| Timeline | 9 weeks (Jan–Mar 2026) | Flexible, self-paced | Typically 12–24 weeks |
| Best For | Career switchers or experienced data engineers wanting community and accountability | Self-motivated learners exploring data engineering | Those needing intensive structured mentorship and guidance |
How to Get Started with the Data Engineering Zoomcamp
To get started with the Zoomcamp, you can either join a live cohort or learn at your own pace.
Learn at Your Own Pace
The self-paced mode is a great way to learn at your own pace. You’ll get access to the course materials and the community. You can start learning immediately and complete the course at your own pace.
All you need is to go to the Data Engineering Zoomcamp GitHub repository and start learning. It serves as a central hub for the course for easier navigation through the course materials. All the lectures are pre-recorded and available on YouTube in our official Data Engineering Zoomcamp YouTube Playlist, so you can watch at your own pace.
You can also join the DataTalks.Club Slack community to get help and support from the community on the (#course-data-engineering) channel.
Remember, self-paced learning does not include homework submissions, project evaluations, or the ability to earn a certificate. To receive certification, you need to join an active cohort.
Join a Live Cohort
When you join a live cohort, you’ll work through the same materials as self-paced learners, but with the added structure of a published schedule and the energy of hundreds of peers progressing alongside you.
Each module typically spans one week, you watch the lectures, complete hands-on exercises, and submit a homework assignment. Your submissions get scored and appear on an anonymous leaderboard.
After completing all six modules, the capstone project phase begins. You’ll build your own end-to-end pipeline, submit it through our form, and peer-review at least three other students’ projects while yours gets reviewed by your peers. This reciprocal process gives you valuable feedback on your work and exposes you to different approaches and solutions you might not have considered.
Ready to join DE Zoomcamp? Here’s how it works:
- Register for the course, you’ll be automatically accepted into the next cohort
- Join the DataTalks.Club Slack community and the
#course-data-engineeringchannel for updates, questions, and peer support - (Optional) Get a head start by exploring the GitHub repository and watching lectures before the cohort officially begins
- When the cohort starts, you’ll receive an email with the full schedule and submission deadlines
- Follow the weekly rhythm: watch lectures, complete exercises, submit homework
- During the final three weeks, build and submit your capstone project, then peer-review three other projects
- Receive your certificate once your project passes peer review
The entire journey takes 9 weeks from start to certificate, and you’ll be part of a global cohort tackling the same challenges at the same time.
Frequently Asked Questions
The Data Engineering Zoomcamp is a free, community-driven program by DataTalks.Club that teaches core data engineering skills through hands-on projects.
This data engineering course has a 9-week curriculum and all the materials are open and available any time on the Data Engineering Zoomcamp GitHub repo. You’ll work with an industry-standard stack including Docker, Terraform, dbt, Spark, Kafka, and BigQuery and earn a certificate.
Yes, the Data Engineering Zoomcamp is completely free. There are no hidden costs, no tuition fees, and no paid tiers. All course materials, videos, homework assignments, and access to the community are provided at no cost. Unlike traditional bootcamps that charge $10,000-$20,000+, this course is entirely community-driven and open source.
The Data Engineering Zoomcamp differs significantly from traditional data engineering bootcamps in several key ways.
- Cost: While bootcamps typically cost $10,000-$20,000+, the Data Engineering Zoomcamp has no tuition fees whatsoever.
- Community: The Data Engineering Zoomcamp is community-driven and open source. All materials remain available forever on GitHub, unlike bootcamps that lock content behind paywalls.
- Flexibility: You can continue at your own pace even after the cohort ends, whereas bootcamps usually have rigid schedules and limited access periods.
To earn your Data Engineering Zoomcamp certificate, you need to complete the project requirements by building an end-to-end data pipeline.
After submitting your project, you must also participate in peer learning by reviewing at least 3 other projects, submitting reviews by the deadline, and providing constructive feedback.
The next cohort of the DE Zoomcamp starts in January 2026. Register here: https://airtable.com/appzbS8Pkg9PL254a/shr6oVXeQvSI5HuWD before the course starts.
The Data Engineering Zoomcamp is run by DataTalks.Club, a global online community of data professionals and learners. While the initial idea and most of the content were created by Alexey Grigorev, memebers of the DataTalks.Club community contribute to the course as instructors and maintainers of the course materials.
DataTalks.Club is often referred to as “the DataTalks Club”, “data talks club”, or “datatalks club”.
No prior data engineering experience is needed. You should be comfortable with the command line and have basic SQL knowledge. Python experience is helpful but not required.
You should expect to spend between 5-15 hours per week, depending on your background. This includes watching videos, completing homework, and working on projects. More time might be needed during project weeks. The time commitment varies based on your familiarity with the tools and concepts.
Yes! All course materials, videos, and recordings remain available after the cohort ends, and you can learn at your own pace. You’ll have access to the Slack community for support, where you can search previous discussions or ask @ZoomcampQABot for help. However, please note that self-paced learning does not include homework submissions, project evaluations, or the ability to earn a certificate. To receive a certificate, you need to join an active cohort.
You have multiple support channels available. Join the DataTalks.Club Slack community where you can ask questions and get help from instructors and fellow students. We also have an FAQ repository with answers to common questions, a @ZoomcampQABot in Slack for quick help, and regular office hours where you can interact directly with instructors.
The Data Engineering Zoomcamp GitHub repository is https://github.com/DataTalksClub/data-engineering-zoomcamp.
The course videos are available in our official Data Engineering Zoomcamp YouTube Playlist. But please refer to the Data Engineering Zoomcamp GitHub repository for easier navigation through the course materials.
We also maintain year-specific playlists for office hours and updates.
Yes, people use these names interchangeably. Throughout this page we’ll use “Data Engineering Zoomcamp” as the canonical name.