Podcast
Contribute to Hugging Face & Build an NLP Portfolio: Open Source, Datasets, Spaces
Open original DataTalks.Club episode
Contribute to Hugging Face & Build an NLP Portfolio: Open Source, Datasets, Spaces
Original Episode
Use these links for the canonical episode and media sources.
- Open the original DataTalks.Club podcast page
- Watch on YouTube
- Listen on Spotify
- Listen on Apple Podcasts
Episode Overview
How do you go from beginner projects to contributing to Hugging Face and building an visible NLP portfolio? In this episode, Merve Noyan — Google Developer Expert in Machine Learning, grad student in Data Science, and NLP-focused ML engineer — walks through practical steps for contributing to open source, datasets, and Hugging Face Spaces.
People
Use these links to connect the episode to guest notes.
Chapter Summary
Use these checkpoints to decide whether to open the source transcript.
- 0:00 - Podcast Introduction
- 1:25 - Guest Welcome & Episode Overview
- 2:02 - Early Career: Industrial Engineering to NLP
- 4:12 - Transition to NLP: First Projects & Sentiment Analysis
- 6:30 - Open Source Discovery: Finding Hugging Face & Contribution Sprints
- 8:13 - Datasets Work: Canonical Datasets, Scripts, and CI Learning
- 10:31 - Contributor Onboarding: Sprints, Good-First Issues, and Confidence Building
- 11:33 - Contributing as a Side Project: Motivation and Timing
- 12:46 - Hugging Face Projects: Tasks, Hub, TensorFlow & Keras Integration
- 15:42 - Model Reproducibility: Hub Features and Model Registry Concepts
- 17:37 - Spaces & Community Tab: Demos with Streamlit/Gradio and Community Collaboration
- 18:31 - Developer Experience: Forum Support, Workshops, and Keras Sprints
- 21:28 - Role Balance: Engineering vs. Advocacy Time Split
- 23:26 - Hiring Signals: Evaluating Open Source Experience on GitHub
- 25:09 - Getting Started with Open Source: Sprints, Documentation, and Non-Code Contributions
- 27:23 - Structured Programs: Google Summer of Code and Hacktoberfest
- 29:26 - Learning from PRs: Contributing to scikit-learn and Code Quality
- 30:21 - Hiring Expectations: Working with Large Codebases and PR Workflows
- 33:23 - Handling PR Rejections: Discussions, Design Decisions, and Unit Tests
- 38:02 - NLP Learning Resources: Courses, spaCy, Keras Examples, and Transfer Learning
- 43:01 - Beginner NLP Projects: Sentiment Analysis and Classification Tasks
- 51:12 - Portfolio Advice: Deploying Demos with Streamlit, Gradio, and Hugging Face
- 55:49 - Content Creation: Twitch Streaming and Podcast Plans
- 57:42 - Contact & Community: Slack, Twitter, and DataTalks.club Outreach
- 58:14 - Personal Anecdote: Mario Kart at Hugging Face