Podcast
From Radio Astronomy to Applied ML: MEERKAT Data Pipelines, Multi-Wavelength Cross-Matching & Production-Grade ML Systems
Open original DataTalks.Club episode
From Radio Astronomy to Applied ML: MEERKAT Data Pipelines, Multi-Wavelength Cross-Matching & Production-Grade ML Systems
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How do you transform raw radio astronomy observations into reliable, production-grade machine learning systems that enable multi-wavelength science? In this episode we talk with Daniel Egbo — an astrophysicist turned machine learning engineer and AI ambassador (Arize, Tavily) and PhD candidate at the University of Cape Town — about bridging radio astronomy and applied ML. Daniel explains the challenges of working with MEERKAT data pipelines, strategies for multi-wavelength cross-matching, and the engineering.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:00 - Podcast Introduction & Lunar Eclipse Anecdote
- 1:13 - Career Overview: From Nigeria to PhD in Cape Town
- 4:12 - MEERKAT and SKA: Radio Telescope Project Overview
- 4:49 - Electromagnetic Spectrum: Radio to Gamma Explained
- 6:19 - Research Goal: Identifying Radio-Emitting Stars in MEERKAT Data
- 6:45 - Telescope Types and Observing Constraints (Optical, Infrared, X-ray)
- 8:00 - Radio Telescope Site Requirements and Space-based X-ray Observatories
- 10:39 - Data Workflow: Detecting Point Sources in Radio Images
- 11:50 - Cross-matching Multi-wavelength Catalogs and Positional Astronomy
- 13:35 - Positional Uncertainty: 2D Projection, Foreground/Background Confusion
- 15:30 - Physics-based Verification: Using Prior Observations to Confirm Sources
- 16:35 - Radio Stars Rarity and Sensitivity Improvements with New Telescopes
- 17:54 - Building Curated Datasets as Foundation for Future Machine Learning
- 21:31 - Early ML Journey: Dataset Scale, Cloud Needs, and Inspiration
- 24:33 - Python Astronomy Tooling: Astropy, NumPy, SciPy for Big Data
- 25:47 - Cloud Computing Practices: JupyterHub and Remote Analysis
- 26:58 - ML ZoomCamp Impact: Transitioning to Reusable Code and Production Practices
- 31:26 - Edge Deployment Internship: Testing Models on Intel Hardware
- 33:38 - LLM Exploration: LangChain, Hugging Face, RAG and Vector Databases
- 42:48 - Course Projects: Orchestration with Kestra, Airflow, MinIO and Spark
- 44:08 - Airflow 3.0 Setup Experience and Astronomer CLI Learnings
- 45:15 - End-to-End Pipeline Example: MySQL → MinIO → Spark → Warehouse (dbt next)
- 47:39 - AI Training Ecosystem: LangChain Academy, Arize, NVIDIA Deep Learning Institute
- 50:20 - Student Benefits: Free NVIDIA Courses and Deploying on GPUs (A100/H100)
- 52:01 - BRICS Astronomy Bootcamp: Beginner-Friendly Data Analytics Program
- 55:12 - Sharing Projects: Colab Notebooks, Public Portfolios and GitHub Visibility
- 57:59 - Career Advice: Learn Python, Do Structured Projects, Leverage Domain Knowledge
- 1:00:21 - Tools & Sponsors: Data Load Tool for Pipelines and Community Support
- 1:01:09 - Learning Resources: Astropy Tutorials, Course GitHub and YouTube Archive
- 1:02:22 - Closing Remarks: Encouragement to Share Progress and Course Availability