Miscellaneous FAQ

This guide covers various common issues and their solutions for the Data Engineering Zoomcamp, including database queries, environment setup, development tools, and general troubleshooting.

Table of contents

  1. Database & SQL Issues
    1. Column not found error with uppercase names
  2. Network & Connectivity Issues
    1. curl host resolution error
    2. SSH connection failures
    3. VS Code SSH permission denied
  3. Python & Environment Setup
    1. pip command not recognized
    2. Generating pip requirements from Anaconda
  4. Docker Issues
    1. Address already in use error
    2. Docker permission denied errors
    3. Docker build context errors
    4. Docker Compose environment variable formatting
  5. Development Tools
    1. Jupyter Notebook installation and usage
    2. Alternative notebook conversion with jupytext

Database & SQL Issues

Column not found error with uppercase names

Problem:

SELECT * FROM zones_taxi WHERE Zone='Astoria Zone';
-- Error: Column Zone doesn't exist

Root Cause: PostgreSQL requires quotes around column names that contain uppercase letters.

Solutions:

Option 1: Use quotes around column names

SELECT * FROM zones AS z WHERE z."Zone" = 'Astoria';

Option 2: Convert columns to lowercase during data loading

df = pd.read_csv('taxi_zone_lookup.csv')
df.columns = df.columns.str.lower()

Note: In the actual dataset, the zone is named ‘Astoria’ not ‘Astoria Zone’.


Network & Connectivity Issues

curl host resolution error

Problem:

curl: (6) Could not resolve host: output.csv

Root Cause: Incorrect curl syntax, particularly common on macOS.

Solution: Use the correct curl syntax:

os.system(f"curl {url} --output {csv_name}")

SSH connection failures

Problem:

ssh: Could not resolve hostname linux: Name or service not known

Root Cause: SSH can’t find your host configuration file.

Solution: Ensure your SSH config file is properly located at:

  • Linux/macOS: ~/.ssh/config
  • Windows: C:\Users\Username\.ssh\config

VS Code SSH permission denied

Problem:

Could not establish connection to "de-zoomcamp": Permission denied (publickey).

Root Cause: SSH key path issues on Windows when using VS Code.

Solution for Windows users:

  1. Copy SSH folder: Copy the .ssh folder from Linux path to Windows
  2. Update config file: Use Windows path format:
    IdentityFile C:\Users\<username>\.ssh\gcp
    

    Instead of:

    IdentityFile ~/.ssh/gcp
    
  3. Check key format: Ensure the private key file has an extra newline at the end

Python & Environment Setup

pip command not recognized

Problem:

'pip' is not recognized as an internal or external command, operable program or batch file.

Root Cause: Anaconda’s Python is not in your system PATH.

Solutions by Operating System:

Linux and macOS:

# Find Anaconda path (typically ~/anaconda3 or ~/opt/anaconda3)
export PATH="/path/to/anaconda3/bin:$PATH"

# Make permanent by adding to shell profile
echo 'export PATH="/path/to/anaconda3/bin:$PATH"' >> ~/.bashrc  # Linux
echo 'export PATH="/path/to/anaconda3/bin:$PATH"' >> ~/.bash_profile  # macOS

Windows with Git Bash:

# Add to PATH (adjust username as needed)
export PATH="/c/Users/[YourUsername]/Anaconda3/:/c/Users/[YourUsername]/Anaconda3/Scripts/:$PATH"

# Add to .bashrc and refresh
echo 'export PATH="/c/Users/[YourUsername]/Anaconda3/:/c/Users/[YourUsername]/Anaconda3/Scripts/:$PATH"' >> ~/.bashrc
source ~/.bashrc

Windows (System Settings):

  1. Right-click ‘This PC’ → ‘Properties’
  2. Click ‘Advanced system settings’
  3. Click ‘Environment Variables’
  4. Edit ‘Path’ in ‘System variables’
  5. Add these paths:
    • C:\Users\[YourUsername]\Anaconda3
    • C:\Users\[YourUsername]\Anaconda3\Scripts
  6. Click ‘OK’ to apply changes

Generating pip requirements from Anaconda

Problem: Need to create a requirements.txt file from Anaconda environment.

Solution:

# Install pip in conda environment
conda install pip

# Generate requirements file
pip list --format=freeze > requirements.txt

Note:

  • Don’t use conda list -e > requirements.txt (won’t work with pip)
  • Don’t use pip freeze > requirements.txt (may include incorrect paths)

Docker Issues

Address already in use error

Problem:

Error: error starting userland proxy: listen tcp4 0.0.0.0:8080: bind: address already in use

Root Cause: Port 8080 is already being used by another process.

Solution: Kill the process using the port:

sudo kill -9 `sudo lsof -t -i:8080`

General format for any port:

sudo kill -9 `sudo lsof -t -i:<port_number>`

Docker permission denied errors

Problem:

Error: error response from daemon: cannot stop container: 
permission denied

Root Cause: Docker service permissions issue.

Solution: Restart Docker services:

sudo systemctl restart docker.socket docker.service

Docker build context errors

Problem:

Error checking context: 'can't stat '<path-to-file>'

Root Cause: Docker build context includes files with permission issues.

Solutions:

Option 1: Use .dockerignore Create a .dockerignore file to exclude problematic files:

# .dockerignore
*.log
temp/
.git/

Option 2: Reorganize files

  • Move Dockerfile and related scripts to a subfolder
  • Run docker build from that subfolder

Docker Compose environment variable formatting

Problem: Environment variables not working in docker-compose.yml.

Root Cause: Incorrect formatting with spaces around equals sign.

Incorrect formats:

environment:
  - PGADMIN_DEFAULT_EMAIL = admin@admin.com  # Spaces around =
  - PGADMIN_DEFAULT_PASSWORD = root
environment:
  - PGADMIN_DEFAULT_EMAIL= admin@admin.com  # Space after =
  - PGADMIN_DEFAULT_PASSWORD=root

Correct format:

environment:
  - PGADMIN_DEFAULT_EMAIL=admin@admin.com
  - PGADMIN_DEFAULT_PASSWORD=root

Development Tools

Jupyter Notebook installation and usage

Problem: Need to install and use Jupyter Notebook.

Solution:

Install and launch:

pip install jupyter
python3 -m notebook

Convert notebook to Python script:

pip install nbconvert --upgrade
python3 -m jupyter nbconvert --to=script upload-data.ipynb

Alternative notebook conversion with jupytext

Problem: nbconvert giving errors during notebook conversion.

Solution: Use jupytext as an alternative:

Install:

pip install jupytext

Convert:

jupytext --to py <your_notebook.ipynb>

Advantages of jupytext:

  • More reliable conversion
  • Better handling of markdown cells
  • Preserves notebook structure in comments