Podcast
Practical Algorithms for Engineers: Bloom Filters, Approximate Nearest-Neighbor & Performance
Open original DataTalks.Club episode
Practical Algorithms for Engineers: Bloom Filters, Approximate Nearest-Neighbor & Performance
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How do engineers choose and implement the right algorithm for memory, latency, and scale? In this episode, Marcello La Rocca — senior software engineer at Tundra.com and author of Algorithms and Data Structures in Action, with experience at Twitter, Microsoft and Apple — walks through practical algorithmic solutions engineers can actually use in production. We focus on Bloom filters for memory-efficient containment checks (and real-world uses like crawlers, routing tables, and adtech device-ID targeting), and on.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:00 - Podcast Introduction
- 1:51 - Guest Intro: Marcello La Rocca and book announcement
- 3:11 - Career Path: web development to Twitter, Microsoft, Apple, Tundra
- 5:19 - Learning Philosophy: focus on applications over formal proofs
- 7:30 - Anecdote: mathematical proof vs practical innovation
- 9:23 - Recommended Resources: MIT course, Tim Roughgarden, Grokking Algorithms
- 10:34 - Core Data Structures: arrays, lists, sets, dictionaries, stacks, queues
- 12:17 - Abstraction vs Implementation: APIs, performance trade-offs
- 15:57 - Practicing Algorithms Outside Work: competitions and side projects
- 19:14 - Using Libraries & Profiling: spotting algorithmic wins in production
- 20:14 - Performance Pitfalls: containment checks and wrong list usage
- 22:12 - Data-Science Use Cases: Bloom filters and nearest-neighbour search
- 23:39 - Book Overview: bridging theory and practical use cases
- 25:04 - Book Structure: basics, nearest-neighbour & MapReduce, graphs & optimization
- 26:31 - Prerequisites & Format: appendices, pseudocode, who the book is for
- 28:37 - Code Repository: implementations in Java, JavaScript, Python (and more)
- 30:09 - Bloom Filter Explained: memory-efficient containment with false positives
- 34:43 - Bloom Filter Applications: crawlers, routing tables, marketing/adtech
- 35:59 - Adtech Example: device IDs and returning-user targeting with Bloom filters
- 39:10 - Nearest-Neighbour Need: KD-tree limits and high-dimensional data challenges
- 42:44 - Approximate Nearest-Neighbour: R-trees, SS-trees for geolocation & logistics
- 44:46 - Vector Similarity: embeddings, recommender systems, Faiss usage
- 47:47 - Frameworks vs Internals: when to trust libraries and when to inspect them
- 49:52 - Cross-language Compatibility: serializing Bloom filters and hash seeds
- 52:55 - Tech Interviews: algorithm emphasis, balanced assessment approaches
- 58:53 - Hands-on Learning: LeetCode, contests, open-source projects
- 1:00:39 - Language Trade-offs: Python vs C++ and using Cython for performance