DataTalks.Club Podcast Wiki
Explore DataTalks.Club podcast episodes by topic, guest, transcript segment, and podcast-backed content.
Open podcast graph
Read wiki
Browse guides
Browse comparisons
Browse roadmaps
Browse how-tos
Browse summaries
Browse people
Search content
Wiki
A/A Testing
How the podcast archive uses A/A testing to validate experiment assignment, tracking, and statistical interpretation before A/B tests are trusted.
A/B Testing
How the podcast archive explains A/B testing as randomized product evaluation, with assignment, metrics, noise, power, and rollout decisions.
AI
How DataTalks.Club podcast guests define AI across machine learning, generative AI, agents, production systems, evaluation, infrastructure, and governance.
AI Agents
What DataTalks.Club guests have said about AI agents: autonomy, tool use, memory, RAG boundaries, evaluation, governance, and infrastructure.
AI Engineer Role
Archive-backed guide to the AI engineer role, including definition, responsibilities, disagreements, skills, boundaries, and podcast examples.
AI Engineering
Archive-backed guide to AI engineering as the discipline of shipping LLM applications, RAG systems, agents, evaluations, and production AI products.
AI Engineering Roadmap
A podcast-backed roadmap for learning AI engineering through software foundations, LLM applications, RAG, evaluation, agents, LLMOps, and production ownership.
AI Infrastructure
Podcast-grounded reference page for compute, GPUs, orchestration, model serving, cost, and operations behind production AI systems.
AI Red Teaming
How DataTalks.Club podcast guests frame AI red teaming as adversarial testing for prompt injection, data exfiltration, unsafe outputs, and agent abuse.
AI Tooling
How DataTalks.Club podcast guests choose and operate AI tooling for model APIs, open-source LLMs, RAG, prompts, agents, evaluation, observability, and deployment.
Academia
How the podcast archive connects academic research, PhDs, postdocs, open science, research software, and transitions into data and AI industry roles.
Academic Researcher to Data Science
Podcast-backed transition notes for academic researchers, PhDs, postdocs, and research software practitioners moving into data science through skills translation, production practice, portfolio evidence, and interview framing.
Agent Engineering
How DataTalks.Club guests define AI agents and engineer them through workflow design, tools, retrieval, evaluation, guardrails, and production constraints.
Analytics Engineering
How DataTalks.Club episodes describe analytics engineering as the discipline of building trusted analytical models, transformations, tests, documentation, and BI-ready data products.
Analytics Engineering Portfolio Projects
Archive-backed guidance for analytics engineering portfolio projects that prove SQL modeling, metric ownership, dbt-style tests, documentation, BI readiness, and stakeholder judgment.
Analytics Engineering Roadmap
A podcast-backed roadmap for analytics engineering: SQL modeling, dbt-style workflows, metric ownership, stakeholder trust, and the move from dashboards to governed analytical products.
Apache Iceberg
How DataTalks.Club podcast guests place Apache Iceberg inside lakehouse architecture, open table formats, catalogs, Parquet storage, Delta Lake and Hudi comparisons, DLT support, and data engineering platform design.
Applied Research
How DataTalks.Club guests describe applied research as hypothesis-driven work that turns uncertain ML ideas into products, reusable systems, and production-ready evidence.
Batch vs Streaming
A podcast-backed comparison of batch and streaming data processing through latency, operations, contracts, cost, ML serving, and product-decision tradeoffs.
Business Skills for Data Professionals
How DataTalks.Club guests connect analytics impact to stakeholder trust, metric definitions, business literacy, prioritization, and communication.
CDC
Change data capture in the DataTalks.Club archive: when to capture row-level database changes, how CDC compares with batch dumps and streaming, and what teams must operate around schema changes, deletes, and replay.
CI/CD
Podcast-grounded reference page for CI/CD in data, ML, and AI systems.
CV Screening
Archive-backed guide to how data CVs and resumes are screened: responsibilities, keywords, project evidence, recruiter calls, bias reduction, and ATS myths.
Caching
How DataTalks.Club guests discuss caching, prompt caching, context reuse, and model efficiency in production AI systems.
Career Development
Archive-backed guide to compounding skills, public proof, interview readiness, internal growth, transitions, and personal brand in data and AI careers.
Career Growth
How the podcast archive frames growth after entering data and AI roles: depth, breadth, visibility, communication, leadership, and senior impact.
Career Transition
Archive-backed patterns for moving into data, ML, AI engineering, analytics engineering, data engineering, product, and freelance roles.
Career Transitions in Data
Archive-backed patterns for moving into data science, analytics engineering, data engineering, ML, AI engineering, and freelance data work.
Causal Inference
How the podcast archive explains causal inference as the discipline for reasoning about interventions, counterfactuals, treatment effects, and policy decisions.
Communication
How DataTalks.Club podcast guests treat communication as a core data and ML skill: stakeholder translation, interviews, writing, consulting, portfolio narratives, and business context.
Community
How DataTalks.Club podcast guests use community as a practical layer for learning, feedback, contribution, visibility, safety, and technical adoption.
Community Building
Podcast-backed patterns for launching, growing, moderating, and sustaining technical communities around data, MLOps, open source, and learning.
Computer Vision
Archive-backed guide to computer vision as applied perception, from images and sensors to labeling, deployment constraints, multimodal retrieval, and career project work.
Contributing
Podcast-backed guidance on useful contribution paths: reproducible issues, docs fixes, examples, tests, pull requests, mentoring, and community participation.
Customer Data Platforms
How the podcast archive frames customer data platforms as bundled tools for collecting, segmenting, analyzing, and activating customer data.
Data Activation
How the podcast archive describes data activation as moving trusted product and customer data into operational tools and decision workflows.
Data Analyst Careers
A podcast-grounded career page for data analyst paths, entry points, portfolio evidence, hiring signals, and transitions into analytics engineering, data science, and data engineering.
Data Analyst Role
Archive-backed guide to the data analyst role: SQL, metrics, dashboards, experiments, stakeholder communication, and boundaries with analytics engineering, data science, and data engineering.
Data Analyst vs Analytics Engineer
A podcast-backed comparison of the data analyst and analytics engineer role boundary: decisions, dashboards, reusable models, dbt, data quality, and team ownership.
Data Architect Role
Podcast-backed definition of the data architect role: end-to-end data ownership, modeling, cloud adaptation, stakeholder alignment, reusable patterns, and boundaries with data engineering leadership.
Data Engineer Role
Archive-backed guide to what data engineers do, where the role starts and ends, and how DataTalks.Club guests describe data engineering work in practice.
Data Engineer vs Data Scientist
A podcast-backed comparison of the data engineer and data scientist role boundary: data paths, modeling, feature work, production handoffs, hiring signals, and team ownership.
Data Engineering
How the DataTalks.Club podcast archive frames data engineering: pipelines, platforms, data quality, role boundaries, business enablement, and the shift toward AI-ready data systems.
Data Engineering Certification
A podcast-backed guide to deciding whether a data engineering certification is useful, how to evaluate certificate programs, and what project and interview evidence employers still need.
Data Engineering Platforms
How the DataTalks.Club podcast archive defines data engineering platforms: shared ingestion, storage, orchestration, modeling, governance, self-service, reliability, adoption, and cost control.
Data Engineering Portfolio Projects
Archive-backed guidance for data engineering portfolio projects that prove useful pipelines, SQL and Python depth, modeling, orchestration, quality checks, and operating judgment.
Data Engineering Roadmap
A podcast-backed roadmap for becoming useful as a data engineer: fundamentals, project sequence, platform judgment, role milestones, and when to stop studying and build.
Data Engineering Tools
A podcast-backed guide to choosing data engineering tools across ingestion, orchestration, storage, transformation, quality, governance, and activation.
Data Governance
How DataTalks.Club guests define data governance through inventory, ownership, catalogs, access controls, quality signals, privacy rules, and policy automation.
Data Lake
How DataTalks.Club podcast guests use data lake as raw, flexible storage for files, events, logs, and long-lived history, plus the governance, table-format, and DataOps work needed to keep it useful.
Data Mesh
How the podcast archive explains Data Mesh as domain-owned data products, explicit contracts, self-serve platforms, and federated governance.
Data Mesh vs Centralized Data Platform
A podcast-grounded comparison of domain-owned data products and centralized platform ownership through architecture, governance, self-service, reliability, and organizational maturity.
Data Observability
Podcast-grounded definition of data observability across freshness, volume, distribution, schema, lineage, ownership, SLAs, and incident response.
Data Pipelines
Podcast-grounded guide to data pipelines as movement, transformation, publication, and operations across ingestion, orchestration, testing, recovery, batch, streaming, CDC, and ML handoffs.
Data Product Adoption
How podcast guests describe getting dashboards, models, analytics tools, and data products into real business decisions.
Data Product Management
How DataTalks.Club podcast guests define data product management: user discovery, role boundaries, roadmaps, adoption, metrics, ownership, and operating discipline for data products.
Data Products
How the podcast archive describes data products as owned, discoverable, trustworthy data interfaces with users and guarantees.
Data Quality and Observability
How the podcast archive frames reliable data systems: data contracts, tests, freshness, lineage, monitoring, and recovery practices.
Data Science
How the DataTalks.Club podcast archive frames data science: product-facing modeling, analysis, experimentation, hiring signals, role ambiguity, and the boundary with ML, data engineering, and AI engineering.
Data Science Careers
Archive-backed career guidance for data scientist roles: role targeting, CV evidence, portfolio signals, interviews, salary, and ambiguous titles.
Data Scientist Interview Roadmap
A podcast-backed roadmap for data scientist interview preparation: role targeting, CV evidence, recruiter screens, technical rounds, case studies, behavioral stories, and offer readiness.
Data Scientist Role
Archive-backed definition of the data scientist role: product questions, modeling, experimentation, ambiguity, role boundaries, and supporting episodes.
Data Scientist to Machine Learning Engineer
Podcast-backed transition notes for data scientists moving toward machine learning engineering through software engineering, deployment, monitoring, MLOps, and production ownership.
Data Strategy
How DataTalks.Club guests connect data strategy to business goals, operating models, governance, platforms, adoption, and tool choices.
Data Team Lead Role
Podcast-backed definition of the data team lead and head of data role: hiring order, team design, stakeholder adoption, quality standards, trust repair, and leadership boundaries.
Data Teams
How DataTalks.Club podcast guests describe data teams as organizational design around data work, including team models, platform ownership, data products, stakeholder interfaces, and scaling risks.
Data Warehouse
Podcast-backed notes on data warehouses as modeled analytical storage for ELT, dbt, BI, governance, cost control, and activation.
Data Warehouse vs Data Lakehouse
A podcast-grounded comparison of warehouse-centered analytics and lakehouse architectures built from object storage, table formats, catalogs, compute engines, and governance.
Data-Led Growth
How growth, product, and operations teams use event tracking, product analytics, and activation to build customer experiences from reliable product data.
DataOps
Podcast-grounded reference page for DataOps as the operating discipline for reliable data pipelines, analytics workflows, and data platforms.
DataOps Platforms
How DataTalks.Club podcast guests discuss DataOps platforms as the operating layer for reliable pipelines, CI/CD, observability, governance, and self-service data delivery.
Deep Learning
Archive-backed guide to deep learning as the neural-network layer of applied AI, covering vision, transformers, labels, production constraints, and portfolio signals.
Delta Lake
How DataTalks.Club podcast discussions place Delta Lake in lakehouse table-format choices, especially beside Apache Iceberg, Hudi, DuckDB, DataOps, data lakes, and governance.
DevOps to Data Engineering
Podcast-backed transition notes for DevOps, SRE, cloud, and platform engineers moving into data engineering through automation, DataOps, pipelines, cloud platforms, and portfolio proof.
Developer Experience
How data, ML, and AI platforms reduce friction for the people who build with them.
Developer Relations
How DataTalks.Club podcast guests describe DevRel as technical education, demos, documentation, community feedback, open-source work, and adoption strategy for data and ML tools.
Documentation
Podcast-grounded reference on documentation as adoption infrastructure, team memory, operational practice, onboarding support, portfolio evidence, and open-source maintenance.
DuckDB
How DataTalks.Club podcast guests place DuckDB in local OLAP, Parquet analytics, cost-aware pipelines, GitHub Actions workflows, headless table formats, and practical data engineering prototypes.
ELT
Podcast-grounded guide to ELT as a load-first data pipeline approach for warehouses, dbt transformations, analytics engineering, orchestration, CDC, quality checks, and governed data marts.
ETL
Podcast-grounded guide to extract-transform-load pipelines, ETL fit, staging, data quality, lineage, and modern platform work.
ETL vs ELT
A podcast-grounded comparison of transform-before-load and load-before-transform pipeline choices in modern data platforms.
Embeddings
How the podcast archive explains embeddings as representations for semantic search, RAG, recommendations, multimodal retrieval, and language systems.
Entrepreneurship
Podcast-backed notes on entrepreneurship in data, AI, open-source tooling, and consulting.
Evaluation
How DataTalks.Club guests judge whether ML, LLM, RAG, product, and production systems are good enough to trust.
Event Tracking
How the podcast archive describes product event tracking as deliberate instrumentation for analytics, activation, support, sales, and growth workflows.
Evolutionary Algorithms
How DataTalks.Club podcast discussions connect evolutionary algorithms to game AI, evolutionary deep learning, prompt search, optimization, and modern agent systems.
Experiment Tracking
Podcast-grounded reference page for experiment tracking as run history, reproducibility practice, and ML platform capability.
Experimentation
How DataTalks.Club guests use experiments to reduce product, ML, and organizational uncertainty before rollout.
Experimentation and Causal Inference
How DataTalks.Club podcast guests connect randomized experiments, causal reasoning, metric design, uplift modeling, and product decisions.
Founder
How DataTalks.Club podcast guests describe the founder role in data, AI, MLOps, open-source, consulting, and digital health startups.
Freelance Data and AI Work
Archive-backed guide to freelance data and AI work: finding clients, pricing, scoping, agencies, direct clients, productized consulting, and career transitions.
Generative AI
How the podcast archive covers generative AI as applied language, chatbot, agent, coding, and content-generation systems.
GitOps for Data Teams
How DataTalks.Club guests describe GitOps, infrastructure as code, access-as-code, and reviewable platform changes for data teams.
Governance
Archive-backed bridge for governance across data, ML, and AI: ownership, access, review, release controls, compliance, and accountability.
Graph RAG vs Vector RAG
How the podcast archive compares graph-driven retrieval with vector-driven retrieval for grounded LLM systems.
Hiring
Archive-backed patterns for hiring data scientists, analysts, data engineers, ML engineers, managers, and applied AI teams.
Information Retrieval
How DataTalks.Club podcast guests discuss retrieval discipline across candidate generation, ranking, RAG, and evaluation.
Interpretability
Archive-backed guide to interpretability as model understanding for debugging, trust, uncertainty, fairness, and responsible decisions.
Job Descriptions
Archive-backed guidance for reading and writing data job descriptions: role clarity, problem framing, requirements, red flags, and candidate fit.
Job Search
Archive-backed tactics for data and AI job search: role targeting, CVs, portfolios, networking, interviews, salary, and red flags.
Knowledge Graph vs Vector Search
How DataTalks.Club podcast guests compare explicit graph relationships with embedding-based retrieval for search, RAG, and domain knowledge systems.
LLM Evaluation Workflows
Practical podcast-backed workflows for evaluating LLM, RAG, and agent systems before and after production.
LLM Production Patterns
How DataTalks.Club guests turn LLM demos into production systems with model choice, RAG, agents, and evaluation.
LLMs
How DataTalks.Club guests discuss large language models as language, retrieval, agent, evaluation, production, and security components.
Leadership
How DataTalks.Club podcast guests describe data and AI leadership across manager, senior IC, platform, strategy, hiring, mentoring, and stakeholder roles.
ML Platform Engineer Role
Podcast-backed definition of the ML platform engineer role: internal ML platforms, developer experience, MLOps services, infrastructure tradeoffs, and boundaries with MLOps and ML engineering.
ML Platforms
Podcast-grounded reference page for shared ML platform systems, internal product strategy, and team enablement.
ML Product Manager Role
How the podcast archive defines the technical product manager role for ML platforms and ML-enabled data products.
ML System Design Documents
Podcast-grounded reference for ML design docs as fail-fast, ownership, data strategy, monitoring, and fallback artifacts.
MLOps
Podcast-grounded reference page for MLOps as the operating discipline for production machine learning systems.
MLOps Engineer
A podcast-grounded guide to the MLOps engineer role: responsibilities, skills, team boundaries, tools, roadmap, portfolio signals, and how the role changes across startups, platforms, and regulated teams.
MLOps Roadmap
A podcast-backed roadmap for MLOps: reproducible experiments, deployment paths, model registries, monitoring, platform adoption, and role milestones.
MLOps Tools
A practical, podcast-grounded guide to MLOps tools for experiment tracking, model registries, CI/CD, deployment, monitoring, platform workflows, and stack selection.
MLOps and DataOps
Navigation page separating MLOps and DataOps into distinct concepts and pointing to the comparison article.
MLOps vs DevOps
A podcast-backed comparison of DevOps and MLOps operating models.
Machine Learning
How the DataTalks.Club podcast archive frames machine learning as applied modeling, evaluation, production design, interpretability, engineering discipline, and business tradeoff.
Machine Learning Engineer Role
Archive-backed guide to the machine learning engineer role: production models, serving, maintainability, platform overlap, and boundaries with data science, software engineering, MLOps, and AI engineering.
Machine Learning Infrastructure
Podcast-grounded reference page for compute, storage, orchestration, serving, monitoring, and platform foundations behind ML systems.
Machine Learning Portfolio Projects
Archive-backed guidance for choosing machine learning portfolio projects that prove problem framing, baselines, data strategy, evaluation, production awareness, and maintainable code.
Machine Learning System Design
How DataTalks.Club episodes frame ML system design as a production discipline: problem framing, data strategy, baselines, evaluation, serving, monitoring, fallbacks, and ownership.
Machine Learning Tools
A podcast-grounded guide to choosing machine learning tools across modeling, learning, experimentation, feature work, MLOps, monitoring, fairness, open source, platforms, and AI tooling boundaries.
Marketing to Analytics Engineering
Podcast-backed transition notes for marketers moving into analytics engineering through SQL, BI, dbt, product analytics, dashboards, and metric ownership.
Metaflow
How Metaflow appears in the DataTalks.Club archive as an ML workflow tool, developer-experience case study, and open-source platform boundary.
Metrics
How DataTalks.Club podcast guests define metrics for product decisions, ML systems, monitoring, experiments, and business impact.
Model Monitoring
Podcast-grounded reference page for watching deployed models, diagnosing drift, and assigning ownership for production ML behavior.
Model Registry
Podcast-grounded reference page for model registries as the handoff point between training, deployment, reproducibility, monitoring, and governance.
Modern Data Stack
How DataTalks.Club guests describe the modern data stack across ELT, warehouses, dbt-style transformations, orchestration, activation, observability, and cost control.
Multi-Agent Systems
How DataTalks.Club guests discuss multi-agent systems through sequential flows, manager-agent orchestration, peer collaboration, tool use, memory, evaluation, and guardrails.
NLP
How DataTalks.Club guests discuss natural language processing across language data, annotation, LLMs, speech, search, and production systems.
Notebook to Production AI Systems
How the podcast archive frames the path from notebooks and experiments to end-to-end AI systems in production.
Open Source
How DataTalks.Club podcast guests discuss open source across ML and data tools, contribution work, governance, licensing, developer relations, and startup distribution.
Open Source Portfolio Evidence
Archive-backed guidance for turning open-source issues, pull requests, documentation, demos, and community work into credible portfolio evidence for data, ML, AI, and DevRel roles.
Open Source and Developer Relations
How DataTalks.Club podcast guests connect open-source stewardship, developer relations, documentation, demos, community feedback, and adoption for data and ML tools.
Orchestration
Podcast-grounded guide to orchestration across schedules, dependencies, retries, backfills, workflow engines, batch inference, and ETL boundaries.
Platform Adoption
How DataTalks.Club podcast guests describe getting shared data and ML platforms used through pain-point discovery, self-service paths, developer experience, enablement, rollout, and measurement.
Platform Engineering
How DataTalks.Club guests describe internal platform teams and self-service platform ownership.
Power Analysis
How the podcast archive uses power analysis to estimate experiment sample size, duration, and detectable effect before teams read A/B test results.
Practices
A bridge page for recurring operating practices in the podcast archive: versioning, testing, documentation, CI/CD, monitoring, ownership, and feedback loops.
Privacy Engineering for ML
How DataTalks.Club guests describe privacy engineering, access governance, privacy-enhancing technologies, and production LLM privacy tradeoffs.
Product Analytics
How the podcast archive connects product analytics to event tracking, metrics, experimentation, activation, and product decision-making.
Product Designer to Data Product Manager
Podcast-backed transition notes for product designers moving into data product management through discovery, SQL, data quality, documentation, portfolio cases, and stakeholder empathy.
Production
How DataTalks.Club guests define production systems across data, ML, and AI through deployment, monitoring, reliability, ownership, cost, security, and operational feedback.
Production Search Evaluation
How the podcast archive evaluates production search with relevance checks, RAG quality, business metrics, A/B tests, and feedback loops.
Prompt Engineering
Practical prompt engineering patterns from DataTalks.Club episodes: role prompts, examples, structured output, evaluation, context engineering, RAG prompts, compression, caching, and prompt-injection risks.
QA to ML and Data Engineering
Podcast-backed transition notes for QA engineers moving into machine learning and data engineering through testing discipline, projects, cloud practice, public notes, and interview framing.
RAG
A practical DataTalks.Club guide to RAG implementation choices and boundaries.
RAG Portfolio Projects
Archive-backed guidance for RAG portfolio projects that prove retrieval quality, context design, citations, evaluation, failure analysis, and production-minded AI engineering.
RAG vs Fine-Tuning
A podcast-backed comparison for deciding when an LLM system should retrieve external context, adapt model behavior, or combine both.
Reinforcement Learning
How DataTalks.Club podcast guests discuss reinforcement learning through agents, rewards, simulators, games, robotics, autonomous driving, optimization, and practical limits.
Reproducibility
How DataTalks.Club podcast guests make data science, ML, research, and data pipeline work rerunnable, reviewable, and explainable.
Responsible AI and Governance
Archive-derived patterns for explainability, fairness, privacy, security, human oversight, and accountable AI governance.
Retrieval-Augmented Generation
How DataTalks.Club podcast guests describe RAG as retrieval quality, context design, generation, citation, evaluation, and production tradeoffs.
Reverse ETL
How the podcast archive explains reverse ETL as sending modeled warehouse data into sales, marketing, support, analytics, and engagement tools.
Salary Negotiation
How DataTalks.Club guests discuss salary conversations in data and AI hiring: ranges, current salary, market research, competing offers, recruiter transparency, and freelance pricing.
Scikit-Learn
A podcast-grounded guide to scikit-learn as a practical ML toolkit, API ecosystem, open-source project, contribution target, and sustainability case study.
Search
How the podcast archive frames search as retrieval, ranking, evaluation, semantic matching, and product relevance.
Search, RAG, and Knowledge Systems
How DataTalks.Club podcast guests connect retrieval, RAG, knowledge graphs, and production knowledge systems.
Security
Archive-backed bridge for security in data and AI systems: LLM abuse, data exfiltration, access control, privacy, and secure ML artifacts.
Self-Service Data Platforms
How the podcast archive frames self-service data platforms: reusable systems, conventions, governance, adoption, and team design.
Software Engineer to Machine Learning
Podcast-backed transition notes for software engineers moving into machine learning through project work, ML evaluation, production systems, MLOps, and role targeting.
Software Engineering
How DataTalks.Club guests apply software engineering discipline to data, ML, and AI systems through requirements, interfaces, testing, deployment, documentation, and maintainability.
Staff AI Engineer
A podcast-backed guide to staff AI engineer scope across production AI, LLMOps, agents, and career leveling.
Startup
How DataTalks.Club podcast guests build, validate, fund, and operate data, AI, MLOps, and open-source startups.
Startups
Recurring startup lessons across DataTalks.Club podcast discussions: problem discovery, validation, MLOps scope, open-source distribution, consulting paths, funding, and startup career tradeoffs.
Streaming
How the DataTalks.Club archive discusses event streaming with Kafka and Flink, feature stores for real-time ML, and low-latency operations.
Teaching
How DataTalks.Club guests teach data, ML, and AI through projects, feedback, community, documentation, bootcamps, and public explanation.
Team Building
How DataTalks.Club podcast guests describe building data, ML, AI, DataOps, and MLOps teams through hiring order, onboarding, role design, platform enablement, leadership, and cross-functional operating models.
Technical Writing
How DataTalks.Club guests describe technical writing as explanation, documentation, public learning, portfolio proof, and developer education for data and ML work.
Testing
How DataTalks.Club guests test data, ML, and AI systems through data checks, CI/CD, evaluation sets, monitoring, and production readiness practices.
Tools
How DataTalks.Club podcast guests choose, operate, teach, and sustain tools across data engineering, MLOps, DataOps, search, RAG, open source, and developer experience.
Tracking Plans
How the podcast archive frames tracking plans as shared event-instrumentation rules for product, growth, analytics, and engineering teams.
Vector Database vs Search Engine
A podcast-grounded comparison of dedicated vector databases and search engines for semantic retrieval, hybrid search, RAG, product search, and operational relevance.
Vector Databases
How DataTalks.Club podcast guests discuss vector databases as retrieval infrastructure for semantic search, RAG, recommendations, and multimodal matching.
dbt
How DataTalks.Club guests describe dbt as warehouse-side SQL transformation plus an engineering workflow for analytics models, tests, documentation, DAGs, and reviewed changes.
Guides
AI Tools for Personal Productivity: Useful Workflows Without the Hype
A practical, podcast-backed guide to using AI for personal productivity through writing, research, coding, automation, evaluation, privacy checks, and agentic workflows.
AI-Powered Business Intelligence: Practical Workflows, Trust, and Limits
A DataTalks.Club podcast-backed guide to AI-powered business intelligence: where LLMs help BI workflows, what governance has to cover, and why trust still depends on product thinking.
Airflow: When Data Teams Need Workflow Orchestration
A podcast-backed guide to Airflow as workflow orchestration for data pipelines, analytics stacks, platform teams, and batch ML workflows.
Analytics Engineer: Role, Skills, Tools, and Career Path
A podcast-backed guide to what an analytics engineer does, how the role differs from data analyst and data engineer jobs, which skills matter, and how to build a career path.
Apache Airflow: Workflow Orchestration for Data Pipelines
A podcast-backed guide to Apache Airflow as a workflow orchestrator: where it fits in data pipelines, how to design DAGs, and when a simpler scheduler or another orchestrator is enough.
Best Data Engineering Course: Choose by Background, Role, and Proof
A podcast-backed decision guide for choosing the best data engineering course for your background, target role, project evidence, and interview readiness.
Data Analysis: Practical Work, Skills, and Portfolio Projects
A podcast-backed guide to practical data analysis: SQL, metrics, dashboards, experiments, stakeholder communication, role boundaries, and portfolio evidence.
Data Engineer Bootcamp: How to Become Job-Ready for the Role
A podcast-backed guide to choosing and using a data engineer bootcamp: SQL, Python, pipelines, portfolio proof, interviews, and job-search follow-through.
Comparisons
Data Analyst vs Analytics Engineer
A podcast-grounded role comparison for deciding whether a team needs analyst ownership, analytics engineering ownership, or both.
Data Engineering and Data Science: How They Work Together
A podcast-backed comparison of data engineering and data science: role boundaries, shared work, production handoffs, team ownership, and learning paths.
Data Product Manager vs Product Manager
A podcast-grounded comparison of product manager and data product manager responsibilities, role boundaries, technical literacy, metrics, and adoption work.
Data Product Owner vs Data Product Manager
A podcast-grounded comparison of data product owner and data product manager responsibilities, decision rights, guarantees, roadmaps, technical literacy, and adoption work.
DataOps vs Data Engineering: What Changes in Practice?
A podcast-grounded comparison of DataOps and data engineering: what each discipline owns, where they overlap, and what changes when teams add DataOps practices.
Delta Lake vs Apache Iceberg
A podcast-grounded comparison of Delta Lake and Apache Iceberg as lakehouse table-format choices, centered on storage, catalogs, engines, lock-in, and platform operations.
Roadmaps
Data Analyst to Data Engineer Roadmap
A podcast-backed roadmap for data analysts moving into data engineering: transferable analyst strengths, missing backend and cloud skills, portfolio projects, and interview positioning.
Data Engineer Roadmap: From Fundamentals to Job-Ready Projects
A practical, podcast-backed data engineer roadmap from SQL and Python fundamentals to pipelines, orchestration, DataOps, portfolio projects, and interviews.
How to Become a Data Engineer With No Experience
A podcast-backed transition guide for becoming a data engineer without prior data engineering experience: first skills, projects, portfolio proof, timelines, interviews, and adjacent-role paths.
How-Tos
Airflow Docker Compose: Local Development, Learning, and Production Caveats
A podcast-backed how-to for using Docker Compose as a local Airflow learning and DAG development environment, with clear production boundaries.
How to Build Data Pipelines That People Can Trust
A podcast-backed guide to building data pipelines with ingestion, transformation, orchestration, contracts, testing, observability, and last-mile activation.