Podcast
Human-Centered Speech Recognition: ASR for Disordered Speech and Accents
Open original DataTalks.Club episode
Human-Centered Speech Recognition: ASR for Disordered Speech and Accents
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How can automatic speech recognition (ASR) better serve people with disordered speech and diverse accents? In this episode Katarzyna Foremniak, a computational linguist with over 10 years in NLP who has built language models for Audi and Porsche and teaches at the University of Warsaw, examines human-centered ASR for atypical and accented speech. We trace her move from linguistics to computational approaches and cover core phonetics and morpho-syntax concepts that matter for speech recognition.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:00 - Episode Introduction: Human-Centered AI for Disordered Speech
- 8:06 - Guest Introduction & Career Highlights (Katarzyna Foremniak)
- 9:06 - From Linguistics to Computational Linguistics: Transition & Skills
- 13:22 - Linguistics Meets Computer Science: Data-driven Approaches
- 15:25 - Phonetics & Morpho-syntax Explained: Core Concepts for ASR
- 20:33 - Phonetics and Speech Disorders: Articulation, Fluency, Voice Quality
- 23:19 - Accents vs Speech Disorders: Variation, Identity, and Comprehension
- 24:41 - ASR Progress: Modern Models (Whisper) and Improved Accent Handling
- 27:31 - ASR Fundamentals: Standard Speech Datasets and Reference Speech
- 30:24 - ASR Limitations with Atypical Speech: Training/Deployment Gaps
- 30:53 - Strategies for Disordered Speech Recognition: Specialized Datasets & Adaptation
- 37:07 - Data Augmentation for Disordered Speech: Synthetic Variations
- 37:33 - Multimodal ASR: Integrating Lip-reading and Visual Cues
- 40:17 - Transfer Learning for ASR: Fine-tuning with Limited Data
- 41:10 - Data Collection Challenges: GDPR, Clinical Data, Language Coverage
- 42:18 - Language & Dialect Effects: Bilingualism and Disorder Variability
- 44:31 - Stammering & Fluency Issues: Characteristics and Recognition Needs
- 45:16 - Pronunciation Challenges: Polish Consonant Clusters and Phonetics
- 46:17 - Practical Transcription Workflow: Amazon Transcribe + LLM Post-processing
- 47:28 - Contextual Language Models in ASR: Meaning Preservation vs WER
- 51:27 - Utterance Analysis in ASR: Phonemes, Words, and Contextual Prediction
- 54:05 - Personalized ASR: User Adaptation, Fine-tuning, and On-device Setup
- 58:00 - Assistive Applications: Communication Tools for People with Disorders
- 1:00:02 - Model Size & Deployment Constraints: Mobile and Edge Considerations
- 1:01:53 - In-Car Voice Recognition: Automotive Use Cases and Limitations
- 1:03:27 - Notable Failure Examples: Elevator/Car Voice Recognition Humor
- 1:04:13 - Closing Reflections: Human-Centered AI Priorities & Further Reading