Podcast
Causal Inference for Real-World ML: Uplift Modeling, Counterfactuals, Treatment Effects & LLM Integration
Open original DataTalks.Club episode
Causal Inference for Real-World ML: Uplift Modeling, Counterfactuals, Treatment Effects & LLM Integration
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How do you move from correlation to actionable decisions — using counterfactuals, uplift modeling, treatment effect estimation, and LLMs — without falling into confounding traps or biased estimators? In this episode, Aleksander Molak, an independent ML researcher, author and educator specializing in causality, NLP and AI strategy, walks through practical causal inference techniques for real-world machine learning applications.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:00 - Episode Introduction
- 1:22 - Guest Intro: Aleksander Molak & book overview
- 2:06 - Career highlights and dyslexia prediction project
- 6:15 - Causal advocacy: democratizing causal thinking
- 7:31 - Association vs causation: limits of correlational reasoning
- 8:55 - Illustrative confounders: race example and ice cream–drowning
- 12:41 - Predictive ML vs decision-making: Zillow and IID assumptions
- 15:36 - Counterfactuals in practice: marketing and recommender systems
- 18:15 - Counterfactuals defined and Judea Pearl’s intervention view
- 21:22 - Meta-learners overview: T-learner and counterfactual estimation
- 24:24 - Conditional Average Treatment Effect (CATE) estimation
- 26:16 - Achieving unconfoundedness: A/B tests vs causal feature selection
- 27:52 - Targeting decisions from uplift estimates
- 29:17 - Deployment risks and debiasing estimators (double/triple ML)
- 32:40 - Uplift modeling: policy evaluation and business metrics
- 33:14 - Evaluating causal models: refutation tests and estimator quality
- 37:37 - Causal discovery and heterogeneous treatment effects (book coverage)
- 38:54 - Cost–benefit of causal models: complexity vs value
- 41:14 - Real-world impact: discovering wasted marketing spend
- 43:25 - Incremental rollout: A/B testing as validation baseline
- 44:26 - LLMs in causal workflows: feature extraction and scoring
- 46:54 - Text as outcome: using LLMs to score experimental text
- 49:17 - Text as treatment/confounder: style extraction and embeddings
- 54:38 - Inferring unobserved variables (e.g., gender/style) with LLMs
- 58:14 - CausalBert demo and code note (PyData Berlin talk)
- 59:33 - Causal ML without experiments: partial identification & sensitivity
- 1:04:03 - Causal graphs and nonparametric identification: minimal observables
- 1:06:07 - Recommended resources: The Book of Why, Molak’s book & GitHub