Podcast
Lead NLP Teams: Hiring, Production Pipelines, MLOps & LLM Tradeoffs (GPT-3, spaCy)
Open original DataTalks.Club episode
Lead NLP Teams: Hiring, Production Pipelines, MLOps & LLM Tradeoffs (GPT-3, spaCy)
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How do you structure an NLP team and build reliable production pipelines while weighing the tradeoffs between GPT-3 and in-house models? In this episode, Ivan Bilan, Engineering Manager at Personio working on Identity and Access Management, walks through practical answers from his transition from linguistics to production NLP and MLOps.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:00 - Podcast Introduction
- 1:54 - Episode Overview: Leading NLP Teams & Ivan’‘s Current Role
- 2:55 - Personio Role: Identity and Access Management Responsibilities
- 4:39 - Career Origins: From Linguistics to Computational NLP
- 7:22 - Early Tech Stack: From Perl to Python and Web Scraping
- 8:42 - Technical Management Study: CDTM, Internships, and Organizational Learning
- 11:54 - Management Transition: From ML Teams to Web Product Engineering & Observability
- 14:07 - Defining NLP Teams: Centralized vs Cross-disciplinary Structures
- 16:45 - NLP Engineer Role: Skills, Linguistics Background, and Tokenization Expertise
- 19:16 - Path to Becoming an NLP Engineer: Practical Resources, spaCy & Hugging Face
- 22:31 - Vision vs Text: Comparing Computer Vision and NLP Challenges
- 24:36 - NLP Engineer vs ML Engineer: Inference Optimization, Deployment & MLOps
- 26:19 - Conversational Designers: Chatbot UX, Dialogue Flow & Non-coding Roles
- 28:38 - Linguists in NLP: Parsing, Information Extraction & Multilingual Needs
- 30:11 - When to Hire NLP Specialists: Task Complexity, Data Needs & Feature Engineering
- 32:21 - Future of NLP: Library Ecosystem, AutoML & Research Velocity
- 34:57 - NLP Pipeline Anatomy: Data Annotation, Task Engineering, Testing, Production
- 38:55 - Large Language Models & Prompting: GPT-3 Capabilities and Simplification
- 43:05 - GPT-3 Limitations: Cost, Control, Bias & Privacy Risks
- 46:10 - GPT-3 vs In-house Pipelines: MVP Strategy, Control & Open-Source Alternatives
- 48:39 - What NLP Really Is: Industry Productization vs Academic Linguistic Research
- 52:57 - AI Benchmarking: Human-level Claims, Dataset Limits & Real-world Gaps
- 53:45 - Machine Translation State: Google Translate, DeepL, Data Coverage & Language
- 58:08 - NLP Pandect & Related Projects: GitHub Resources for NLP, Microservices &