DataTalks.Club

Engineering MLOps

by Emmanuel Raj

The book of the week from 05 Jul 2021 to 09 Jul 2021

MLOps is a systematic approach to building, deploying, and monitoring machine learning (ML) solutions. It is an engineering discipline that can be applied to various industries and use cases. This book presents comprehensive insights into MLOps coupled with real-world examples to help you to write programs, train robust and scalable ML models, and build ML pipelines to train and deploy models securely in production.

The book begins by familiarizing you with the MLOps workflow so you can start writing programs to train ML models. Then you’ll then move on to explore options for serializing and packaging ML models post-training to deploy them to facilitate machine learning inference, model interoperability, and end-to-end model traceability. You’ll understand how to build ML pipelines, continuous integration and continuous delivery (CI/CD) pipelines, and monitoring pipelines to systematically build, deploy, monitor, and govern ML solutions for businesses and industries. Finally, you’ll apply the knowledge you’ve gained to build real-world projects.

By the end of this ML book, you’ll have a 360-degree view of MLOps and be ready to implement MLOps in your organization.

Questions and Answers

Tino

Hello Emmanuel Raj 🙂 Thanks for taking the time! As it feels like MLOps is currently on the rise when do you think it is really needed for a company to focus on MLOps? I often feel it is important to get something out there to see the impact but the operational part is often only the 3. or 4. steps whereas model drift, ect. can cause a negative business impact right away. Would you recommend to set up a good MLOpy framework before going live?

Emmanuel Raj

Hello Tino,good question! For any tech company focusing on making intelligent products (or data powered) setting up MLOps is recommended (to save time, money and energy). It saves a lot of time for data scientists and SE team from mundane tasks (deployments, manual tests, repetitive data engineering tasks, manual debugging etc) and enables them to focus on what truly matters (making best models and learning in realtime from data) by taking benefits of automation via CI-CD pipeline to avoid mundane jobs. Yes, before implementing in live/production it is recommended to test the pipeline and monitoring features (model, data, feature drifts etc) in DEV and QA environments to validate if the MLOps pipeline provides business value or not (measure with business KPI’s), only then go live/to production.

Tino

Okay cool 🙂 Got it 🙂 Thanks!!

Emmanuel Raj

Hello everyone! I am glad to answer you questions 🙂 looking forward to hearing your thoughts!

Agrita Ga

Do you believe there should be a difference between implementing MLOps in a smaller organization (let’s say startup focusing on ML solutions) and bigger organization?

Emmanuel Raj

Yes, there will be some differences b/w MLOps pipelines for smaller vs big organisations based on their business needs, teams and data processing abilities. Check out chapter 2: Characterizing Your Machine Learning Problem in the book. It explains on this on detail 🙂 (screenshot from chapter 2, figure 2.9)

Agrita Ga

I’ll expand this question a bit, - does best practices (or some tips&tricks, or even tech stack?) differ for small vs big teams?

Emmanuel Raj

Yes, they differ case by case (company by company) especially the tools. But some of the the principles remain same on high level (MLOps pipeline to build, deploy and monitor ML models). For small companies, most likely they need to build the MLOps platform with limited, budget both in money and time and their operations might be small on scale. Big companies have high volume of data, operations and teams so the setup of MLOps pipelines/platform will differ in most cases 🙂

Agrita Ga

Thanks for your input! Appreciate a lot! 🙌

Diego

Hi Emmanuel Raj, thanks for this opportunity of q&a. to Do you think that MLOps is definitely a skill that every data scientist needs to have if he/she wants to keep relevant in the job market or, on the other hand, data scientists should just focus on data/statistics/algorithms because otherwise they are ‘biting off more than they can chew’?

Emmanuel Raj

Good question Diego! Knowing how to setup MLOps pipeline/platform (infra and architecture) is a bit too much for data scientists. However it is recommended for data scientist to know how to work with features of MLOPs pipelines/platform (once they are setup) such as registering datasets, models and packaging them on the MLOps platform. This way DS’s can take the benefits of MLOps and focus on what they are good at ‘data/Stats/algorithms’. I hope this answers your question 🙂

Chetna

Hi Emmanuel Raj, what’s your take on the importance of cloud technologies certifications? do they make a resume more relevant for MLOps role?

Emmanuel Raj

Hello Chetna Cloud certifications are worth it, they give a 360 degree view on what cloud has to offer so you may pick and choose best services to solve your business problems (optimisation). They surely make the resume standout for MLOps role, companies are looking for ML engineers who know data engineering and infrastructure setup well (certified is better) 🙂

Chetna

thanks 🙂

Lalit Pagaria

What is importance of choosing right Cloud Provider in implementation of MLOps?
What things to take care of while implementing MLOps?
In your experience, which providers do you suggest for small and medium startups?

Emmanuel Raj

Lalit Pagaria Choosing right tools for the business problem is most important (not the other way around). Any cloud which has capabilites to serve your needs will do (these days most of them are good enough). Down side of cloud though is vendor lock, to avoid that we can use cloud agnostic/open source MLOps tools e.g. MLFlow and Valohai which can work with most of the clouds. So choosing right cloud/tools depends on the business problem at hand 🙂

Matthew Emerick

Hey, Emmanuel Raj! Thanks for doing this!
In an open world where both the data and the environment itself are constantly changing, how does MLOps keep up?

Emmanuel Raj

Good question Matthew Emerick! MLOps addresses that constantly changing environment by adapting to the changing data/environment, optimising performance for changes, auto scaling and being relevant for the changing environment 🙂

Neal Lathia

❔ How uniform do you think MLOps workflows is across companies?

Emmanuel Raj

Hard to generalize at this point as different companies are at different stages in their ML adoption 🙂

Mansi Parikh

Thank you, Emmanuel, for sharing your thoughts!
Should MLOps be a concern during early stages of an organization or only when it becomes necessary? (More specifically, at what stage of growth of a data department does this become top of mind?)

Emmanuel Raj

Nice question Mansi Parikh! If the company/organisation is sure of having ML models in their workflow then the sooner the better it is to think of implementing MLOps. Otherwise, When data pipelines are set up and the organisation has the needed data setup in place. The sooner the better it is 🙂

Mansi Parikh

thank you so much, Emmanuel! this is great. I appreciate the thoughtful response. 🙂

Rushanthi

Hi Emmanuel Raj thanks a lot for the golden opportunity on QnA.
What’s your point of view on MLOps when it comes to job market? To be more elaborative, there are number of roles when it comes to data science which involves a data Analyst, data scientist and as well as a machine learning engineer, when taking these roles into consideration which job role requires experience on MLOps?
But then again there arises another question where we are headed towards an automated ML what would be the outcome of MLOps with relevance to the job roles in the market?

Emmanuel Raj

Rushanthi It’s good for an ML Engineer to have experience in MLOPs (especially data engineering and platform setup). Data Scientist is the user of MLOPs platform, so it helps if they have some exp using MLOPs platforms to build and deploy models. Data analysts can do without it. Good question on where are we headed with automation - Time will say but probably MLOPs will impact every Data science/Engineering job roles (let’s hope positively) e.g. more efficiency, less time and resources.

Rushanthi

Thanks a lot for the enlightenment Emmanuel. Appreciate it a lot for clearing up my puzzle 🙌

Oleg Polivin

Hi Emmanuel Raj, thanks a lot for this opportunity! I would like to ask you two questions that are a bit related.

  1. In your opinion, what is the main added value that an MLOps person brings to a company?
  2. Is it easy to replace an MLOps engineer?
    A brief thought that was the reason to ask the questions is in the thread.
Oleg Polivin

When I was working on projects that needed something I would call MLOps: creating a docker application, deploying on a google k8s cluster, making a pipeline using gitlab ci, I realized that I do not understand how it is working under the hood, but just looking for “recipes”, keywords and using tutorials or to a lesser extent documentation (written in a form of a “recipe” as well). Like: put this into gitlab-ci.yaml file, click on this, that and that in google cloud. Sure, it took a long time to make all the parts work together.
However, it makes me think that:

  • there is no special knowledge involved into MLOps vs. ,say, data science where one is expected to know math or statistics.
  • Therefore, it makes be a bit “afraid” that MLOps engineer will be either replaced by some automatic deployment solutions or
  • simpler, young people who tend to grasp many new tools that appear.
    Thank you!
Doink

+1

Emmanuel Raj

Good question Oleg Polivin! MLOps person brings added value to a Data science team mainly in terms of infra setup/maintenance, monitoring ML Models/systems and maintaining pipelines. Sure it can be done/learned by others and MLOps engineers can may as well be replace (e.g. with SRE engineers, DEVOps engineers etc). For now it looks like data scientists are more on the verge of replacement (with AutoML) 😃 it’s not as easy to replace MLOps engineer though but maybe with time and more automation tools we might get there where MLOps engineers can be easily replaced.

Lamjed Debbich

Hi Emmanuel Raj, thank you for this nice book, it covers one of the subjects that interests me a lot. As you know, there are many methods of MLOPS on the market, the new user can get confused for which method should we use? Do you have any tips for getting started?

Doink

+1

Emmanuel Raj

Hi Lamjed Debbich Thank you! To begin with I suggest to get a good theoretical understanding of MLOps workflow, learn how to build ML microservices using docker and deploy them on various deployment targets (Engineering MLOps book will give a great headstart on this). After that decide on MLOPs tools (cloud or opensource, e.g. Azure, MLFLow etc) you would like implement and then find resources that teach you implementation. Learning by doing is the best way to learn MLOps 🙂

Alexey Grigorev

What do you think about the role “MLOps engineer”? Does it make sense? Should it exist?

Alexey Grigorev

I see that it’s often synonymous to ML engineer, which I don’t agree with. What’s your opinion about it?

Chi

From my understanding, there are (at least) 2 types of MLE defined by most of the companies. Some MLEs focus on the ML models algorithms, other MLEs focus on designing “data intensive application” or wee can call it MLOps? Again, ML in production is not just the ML algorithms — Andrew Ng.

Emmanuel Raj

Depends. If a person is needed for team to setup and monitor infra and operations it’s a good idea, otherwise Devops (or SRE) engineers with some knowledge of ML can enable MLOps 🙂

Alexey Grigorev

Also, what’s your opinion about the new course from Andrew Ng? I’m taking about MLEPs

Emmanuel Raj

Haven’t looked at the content, can’t say much but for the look of it, looks like it’s more focused on ML Engineering/DS problems (e.g optimisation, robustness, efficient modelling etc) and not much on MLOps 🙂

Doink

I see different MLOps course one from Made with ML, another on Udemy which is quite popular then we have Andrew Ng and then there is full stack deep learning course covering stuff. Which course or path do you recommend for a noob?

Emmanuel Raj

Made with ML looks good (recommended), not sure of Udemy course or Andrew NG’s courses (haven’t looked at the content) 🙂

Tino

Hey Emmanuel Raj Would you rather suggest to build an MLOps system on your own or buy it from an external provider? I saw that Fiddler has an amazing solution

Emmanuel Raj

Hey Tino, it’s hard to generalize for all cases but depends on the use case we work on. For some cases data is too sensitive and can’t be worked using external tools. For those cases it is better to build on your own but otherwise plug and play solutions like fiddler, mlflow, valohai etc are awesome, using them will save a lot of time and energy 🙂

To take part in the book of the week event:

  • Register in our Slack
  • Join the #book-of-the-week channel
  • Ask as many questions as you'd like
  • The book authors answer questions from Monday till Thursday
  • On Friday, the authors decide who wins free copies of their book

To see other books, check the the book of the week page.

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.