Podcast

Contribute to Hugging Face & Build an NLP Portfolio: Open Source, Datasets, Spaces

S9E6

Open original DataTalks.Club episode

YouTube Spotify Apple Podcasts

machine learning NLP open-source

Contribute to Hugging Face & Build an NLP Portfolio: Open Source, Datasets, Spaces

Original Episode

Use these links for the canonical episode and media sources.

Open the original DataTalks.Club podcast page
Watch on YouTube
Listen on Spotify
Listen on Apple Podcasts

Episode Overview

How do you go from beginner projects to contributing to Hugging Face and building an visible NLP portfolio? In this episode, Merve Noyan — Google Developer Expert in Machine Learning, grad student in Data Science, and NLP-focused ML engineer — walks through practical steps for contributing to open source, datasets, and Hugging Face Spaces.

People

Use these links to connect the episode to guest notes.

Merve Noyan

Chapter Summary

Use these checkpoints to decide whether to open the source transcript.

0:00 - Podcast Introduction
1:25 - Guest Welcome & Episode Overview
2:02 - Early Career: Industrial Engineering to NLP
4:12 - Transition to NLP: First Projects & Sentiment Analysis
6:30 - Open Source Discovery: Finding Hugging Face & Contribution Sprints
8:13 - Datasets Work: Canonical Datasets, Scripts, and CI Learning
10:31 - Contributor Onboarding: Sprints, Good-First Issues, and Confidence Building
11:33 - Contributing as a Side Project: Motivation and Timing
12:46 - Hugging Face Projects: Tasks, Hub, TensorFlow & Keras Integration
15:42 - Model Reproducibility: Hub Features and Model Registry Concepts
17:37 - Spaces & Community Tab: Demos with Streamlit/Gradio and Community Collaboration
18:31 - Developer Experience: Forum Support, Workshops, and Keras Sprints
21:28 - Role Balance: Engineering vs. Advocacy Time Split
23:26 - Hiring Signals: Evaluating Open Source Experience on GitHub
25:09 - Getting Started with Open Source: Sprints, Documentation, and Non-Code Contributions
27:23 - Structured Programs: Google Summer of Code and Hacktoberfest
29:26 - Learning from PRs: Contributing to scikit-learn and Code Quality
30:21 - Hiring Expectations: Working with Large Codebases and PR Workflows
33:23 - Handling PR Rejections: Discussions, Design Decisions, and Unit Tests
38:02 - NLP Learning Resources: Courses, spaCy, Keras Examples, and Transfer Learning
43:01 - Beginner NLP Projects: Sentiment Analysis and Classification Tasks
51:12 - Portfolio Advice: Deploying Demos with Streamlit, Gradio, and Hugging Face
55:49 - Content Creation: Twitch Streaming and Podcast Plans
57:42 - Contact & Community: Slack, Twitter, and DataTalks.club Outreach
58:14 - Personal Anecdote: Mario Kart at Hugging Face