Machine Learning Zoomcamp: Free ML Engineering course. Register here!

DataTalks.Club

What Open Source Can Do For Your Data Career

From baby steps on GitHub to Developer Advocate Engineer hero

15 Jul 2022 by Mehdi OUAZZA

Image by Kushagra Kevat on Unsplash

In today’s data world, we use many open source tools. From data engineering to data science and deep learning. This is a unique opportunity to grow your career in multiple ways. Merve Noyan, Developer Advocate Engineer, shared her journey up to Hugging Face, leader in making Machine Learning more accessible. She also advised on how to get started and where NLP is heading. Here are my takeaways from the excellent podcast at DataTalks.Club.

The article is organized as follows:

  • What’s a Developer Advocate Engineer

  • How to get started with Open Source

  • How platforms like Hugging Face Space can help you to showcase your project

  • A word on the future of NLP

Developer Advocate Engineer got you covered

First off, Merve Noyan mentioned that there are some differences between Developer Advocate definitions. Some are more oriented to community growth and will limit their contribution to primarily educational content.

Hugging Face added the Engineer part to explicitly mention that the role is deeply technical and you will work on product features that support the team horizontally.

But how do we get there?

Baby steps in Open Source

Contributing to open source can be scary. With an unknown codebase, unknown way of working, a lot of automation during the PR, where do you get started?

Well, Merve Noyan emphasizes that you don’t need to code to do your first steps!

For instance, you can:

  • Update documentation

  • Helping on StackOverflow

  • Submitting an issue with reproducible steps to this one.

  • Promote it, write a blog post (or even a video!).

Next level

If you want to get your hands dirty, it’s worth looking at the first good issue label on GitHub and starting the discussion before implanting anything.

It can be frustrating to have your PR rejected because it’s not in line with the design decision. Merve Noyan highlighted that maintainers will always be happy to discuss with you, as they respect your time and commitment on the project.

Multiple events will promote open-source contributions.

Here are a few of them:

  • Contribution sprint: Many opensource projects have dedicated contribution sprints where maintainers will focus their time onboarding and helping new contributors.

  • Hacktoberfest

  • Google Summer of Code

From contributing to Open Source to landing your dream job

There’s a great secret about doing work in public: it’s public. Anyone can look it up. It could also speed up technical interviews as you may have already proven your abilities through some PR’s.

Merve Noyan contributed to different Open Source projects already before joining Hugging face. She also gave a few workshops later on NLP/TensorFlow.

After these contributions to the open-source community, Hugging face project included, they seamlessly reached out to offer a job opportunity.

How to get started in NLP

Your first project

Sentiment analysis is an excellent first entry into the NLP world as data representation is pretty simple: sentences and labels. There are a lot of use cases where we need to know people’s perceptions about a product, service, or brand. On the opposite, summarisation or paraphrasing are more challenging NLP tasks.

Merve also recommends upskilling yourself in Transfer Learning in general. There are a lot of pre-trained models available today like BERT or GPT. You usually get better results when you fine-tune them on downstream tasks than training your own from scratch.

How Hugging face can help promote your project

Nobody wants to git clone and read your README to set up your project. The last mile would be to deploy your project so that any non-technical user can easily access it.

Fortunately, there are a couple of platforms that help you to do so. Kaggle provides a notebook runtime to show off your projects.

Hugging Face goes a step further and has “Spaces.” They will host your model demos, made by Streamlit or Gradio, and it will be open for everyone. You can see it as a personal portfolio for models.

About the future of NLP

NLP solves tasks which are shaped according to your data. For instance: Question-answering, and speech tagging.

The next big thing is solving multiple tasks with one big model without fine-tuning : zero-shot learning. It’s a trend not only in the NLP domain but also in visions like DALL-E. There are a lot of multi-models or generative models.

Two exciting papers that Merve recommends on these are T0 by Hugging face and Flamingo by Deepmind.

Summary

In this article, we covered through the story of Merve how impactful it can be to contribute to Open source for your career. We discussed the different action points you can take to get started and all the events that can help you bootstrap your Open Source journey. On top of that, we mentioned how to get started with NLP and its future.

There has never been a better opportunity to contribute to Open Source.

There are tons of projects

Many platforms to lower the technical barrier to deploying and showcasing your work.

And everything that you will do will be public, which is gold for future reference.

So don’t hesitate, and make the leap!

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.