Prerequisites
The Data Engineering Zoomcamp does not require prior data engineering experience. It does assume basic technical literacy.
For general expectations about zoomcamp time commitment, see Before You Start.
Required skills
- Command line basics. Comfortable running commands in a terminal (Linux, Mac, Git Bash, or WSL).
- Basic programming concepts. Variables, functions, loops, conditionals.
- Python knowledge. Basic Python or the ability to pick it up quickly. The course uses Python heavily for ingestion scripts.
- SQL fundamentals. SELECT, WHERE, JOIN, GROUP BY. You will write more complex SQL during the course (window functions, CTEs, partitioning queries).
- Git basics. Cloning repositories, making commits, pushing.
If you are weak on any of these, the course is still doable, but you will spend extra time on the fundamentals in the first weeks. The course is beginner-friendly for data engineering, not for programming.
Tools you will install
- Docker (Docker Desktop on Mac/Windows, Docker Engine on Linux).
- Python (3.10+ recommended).
- Terraform.
- A text editor or IDE (VS Code is the most common, but any IDE works).
- The Google Cloud SDK (
gcloudCLI).
For the setup choices, see Environment Setup.
Time commitment
- Two weeks for Module 1 (longer to allow for environment setup).
- One week each for modules 2 to 6.
- Two to three weeks for the final project.
Plan for 10 to 15 hours per week. Past cohorts’ actual time data is visible on the 2024 and 2025 dashboards.
You do not need
- A degree.
- Prior data engineering jobs.
- Prior cloud experience (you will pick up GCP basics as you go).
- Prior experience with any specific tool the course teaches (Docker, Terraform, Spark, Kafka, dbt - all introduced from scratch).
If you are stronger than the prerequisites
You can move faster, work ahead, and use the time for a more ambitious project. Some past participants have completed all modules in 4 to 5 weeks instead of 7.