Machine Learning Zoomcamp: Free ML Engineering course. Register here!

DataTalks.Club

Practical MLOps

by Noah Gift, Alfredo Deza

The book of the week from 30 Aug 2021 to 03 Sep 2021

Getting your models into production is the fundamental challenge of machine learning. MLOps offers a set of proven principles aimed at solving this problem in a reliable and automated way. This insightful guide takes you through what MLOps is (and how it differs from DevOps) and shows you how to put it into practice to operationalize your machine learning models.

Current and aspiring machine learning engineers–or anyone familiar with data science and Python – will build a foundation in MLOps tools and methods (along with AutoML and monitoring and logging), then learn how to implement them in AWS, Microsoft Azure, and Google Cloud. The faster you deliver a machine learning system that works, the faster you can focus on the business problems you’re trying to crack. This book gives you a head start.

Questions and Answers

Alex S

I wasn’t sure how it’s possible to read this book as it isn’t published until October this year. Could you let us know, Alexey Grigorev?

Alexey Grigorev

Probably you should ask Noah Gift about it 😃 But you can read it through OReilly Learning, the early release version is already available there

Alex S

Ah ok I didn’t realise that you could read the book before it was published!

Mahmoud Jalajel

Same in Germany.
And I think my trial with O’Reilly learning expired. So I’ll have to wait for the book release!

Maja

Me too. I can’t wait till October to by this book fromNoah Gift. His previous two books (Python for DevOps and Pragmatic AI) are great and have been a huge help for me. Also, his co author Alfredo Deza has such an inspiring life story.

Noah Gift

Yes, you can read online in rough draft form on the O’Reilly website: https://learning.oreilly.com/library/view/practical-mlops/9781098103002/

Noah Gift

It also should be in kindle form in around 30 days or so and in print soon after.

Praveen

Noah Gift
Are there any generic rules behind selecting MLOps tools for a given ML task ?

Noah Gift

A good place to start is by using the tools on the platform you are already on. All major cloud platforms have an MLOps solution and this is a great place to start. AWS Sagemaker, GCP Vertex AI, and Azure ML Studio

Kshitiz

Noah Gift and Alfredo Deza - First of all thanks for doing this. I want to discuss couple of things here -

  1. Should MLOps be applied to all data science/ML projects or should people be looking at some sort of maturity in the project? To put it simply - Should there be any minimum requirements in terms of size of data, number of users if it’s used in an application, how long in a problem do people have to wait to get the results validated etc. ?
  2. In what sort of problems/use cases are feature stores useful? How is feature store different than a database?
Noah Gift
  1. I do think the process of MLOps should be applied to all projects because it is an extension of DevOps. All software projects should have CI/CD and you can even do this with notebooks: https://github.com/noahgift/myrepo
  2. For feature stores they have raw materials in a form easily consumed by a ML pipeline. I.E. Containers package the runtime with the code, Feature stores package the raw ingredients for ML into a metadata system. A database is too low level by itself to be a feature store.
Eunice

Noah Gift What are the common skills between an MLOps and a Data Engineer ? And what skills are specific to MLOps ?

Noah Gift

There is a strong overlap between Data Engineer and MLOps with perhaps as little as a 5% overlap. The key 5% is that a MLOps practitioner also knows a bit about ML and can train models, diagnose their output and knows about ML Platforms like AWS Sagemaker, MLflow, etc.

Denis Volk

Larger companies are using in-house MLOps platforms, while for smaller teams, it is hard to dedicate lots of development time to set up similar machinery. On the other hand, some level of MLOps is just necessary to keep an ML project useful to business users. How to determine the right amount of MLOps for a project?

Noah Gift

I would start with whatever platform is available and use their offerings: i.e. Google, AWS, Azure. Let’s take AWS for example, if you have gigantic data and gigantic teams, say over 250 people in your company then a “big” platform like Sagemaker probably makes sense because of how much it offers.
If you use AWS but have a 3 person team, Sagemaker may or may not be the best easy win. Perhaps AWS App Runner with open source MLOps tools might be a better fit.

Alexey Grigorev

Which open source tools can you recommend?

Jon Exume

Can you talk about the specific careers that MLOps plays a big role in?

Noah Gift

Autonomous driving is a good example. I went to Tesla AI Day last week and 90% of the people I spoke with did MLOps, i.e. tools/infra around computer vision.

Jon Exume

Thanks

David Cox

I appreciate your taking the time to answer questions, Noah Gift! From your experience, what is the background of the primary people you see getting into MLOps?

Noah Gift

People with a strong DevOps/Infrastructure skill set can easily make the transition to MLOps. They just need to pick up a bit of ML training. One way to do this is to read the book I wrote and also to get AWS ML Certification certified (or similar). Note, I helped create the AWS ML certificaiton….

David Cox

Thanks, Noah!

David Cox

A follow-up question to the one above. Sometimes “new” jobs in technology are just the same skills from past positions but combined in a new way or centering around a new tool. What do you think distinguishes MLOps from past, similar areas? And, what similarities does it share with other areas/processes?

Noah Gift

I think MLOps is essentially an evolved DevOps but with the addition of ML.

Duverger PETGA

Hi Noah Gift I really appreciate your work but I have one question : between “Cloud Computing for Data Analysis” and your actual book “Practical MLOps” or “Python for Devops”, in what order we have to read your books ? For a beginner in MLOps ?

Noah Gift

You can read in any order. Since both Python for DevOps and Cloud Computing are start with either then move on to Practical MLOps. They all have a similar theme with more depth on cloud, devops or mlops depending on the book

Doink

How to decide which tools to choose? Should one choose for an open source alternative or choose a tool by a cloud service provider?

Noah Gift

How to decide which tools to choose?
whatever is simple to get started with an improves automation and quality.

Should one choose for an open source alternative or choose a tool by a cloud service provider?
I personally prefer to pay a vendor, so I would start with a cloud offering.

[10:03 AM] There are a plethora of tools coming out, how do you make a framework on choosing which tool to choose and how to choose?
If you are on a cloud platform start with what they offer and go from there.

[10:04 AM] How to practically navigate through the MLOps cycle? Some nuggets of wisdom like MLOps isn’t a tech problem but a people problem etc
Make sure you have CI/CD working and iterate from there.

[10:04 AM] Do small startups really need MLOps or is it over engineering?
MLOps is a behavior/methodology that focuses on Kaizen (continuous improvement). So it applies to anything small or big.
A. Automate everything
B. Make it better quality daily

Doink

There are a plethora of tools coming out, how do you make a framework on choosing which tool to choose and how to choose?

Doink

How to practically navigate through the MLOps cycle? Some nuggets of wisdom like MLOps isn’t a tech problem but a people problem etc

Doink

Do small startups really need MLOps or is it over engineering?

WingCode

Hi Noah Gift,
Why did you choose the cheetah as the book cover? How is it related to MLOps? Does it portray the advantages given by MLOps ? 🙂

xnot

Looks like a 🐕, probably dalmation

Noah Gift

We don’t have control of the animals.

Alper Demirel

Hi Noah Gift, thanks for being with us.
What should be the starting point for our current project for MLOps? And what are the biggest disadvantages that MLOps bring?

Noah Gift

To start with I would make sure you have CI/CD, i.e. the foundation of modern software engineering. This is the first step.

I don’t believe there are any disadvantages to MLOps. In a nutshell it just means “Kazien”, i.e. continuous improvement. Make everything better and more automated.

Lalit Pagaria

Thanks Noah Gift for this session. I have following queries
What are good observability tools are there in MLOps space? (Specially open source tools)
What is most important MLOps checklist for business critical model serve pipeline?
Do you believe current set of lowcode/nocode MLOps solutions are good enough to be used for mission critical usecase?

Noah Gift

I would start with traditional monitoring/instrumentation for you platform using whatever tools are already in place. Then add additional business logic for ML.
Additionally if you use Cloud Platforms they have default monitoring like for example Azure ML Studio which does model versioning and experiment versioning.

Noah Gift

“What is most important MLOps checklist for business critical model serve pipeline?”
Start with CI/CD, if you don’t have this you cannot do MLOps

Noah Gift

“Do you believe current set of lowcode/nocode MLOps solutions are good enough to be used for mission critical usecase?”

Yes, in many cases you don’t need to write code. A good example is Azure ML Studio AutoML.

Eunice

Hi Noah Gift, Alfredo Deza thanks for the quick answers. When a team starts using the Agile framework, they may need a Scrum Master to facilitate and help to implement Agile. Do you think an MLOps specialist may be necessary for big organizations used to other frameworks to start using MLOps? Or hire an ML Engineer and have a Lead Data and Project Manager aware of the subject may be sufficient?

Noah Gift

I think it may help to have someone who has some form of MLOps certification. One good example of this is course I just created on Coursera: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale

Noah Gift

Btw, you can also help promote a lot of my content and contribute to charity with this humble bundle, including PSF and women who code: https://www.linkedin.com/posts/noahgift_humble-software-bundle-python-2021-activity-6838263509390807040-zJ98>. Help spread the word.

Kamran Ali

Noah Gift Is this book covers any specific Cloud Platform (e.g. AWS ) or any specific tool (e.g. MLFlow) etc

Noah Gift

We cover AWS/Azure/GCP very heavily

Kamran Ali

Thanks for the response ! 🙂

Alexey Grigorev

By the way, we have another celebrity appearance - Alfredo Deza himself! Welcome Alfredo!

Maja

Hello Alfredo Deza ! Thank you so much for joining us. I am so happy to have this opportunity to e-meet you and to ask questions. From your inspiring life story we can learn that anything is possible and that geat tihngs do happen. You just have to love what you are doing and to do it in the best way you can. From your book “Python for DevOps” we have learned how to do DevOps in Python. But, I have to ask you considering that ML pipeline is more complex, what are things we shouldn’t ever do - bad practices that happen due to the lack of knowledge, or experience?

Alfredo Deza

Hi Maja! Thanks for the super kind words. This is a great question! I think that there are a few things from seeing the opposites of the core pillars of operations (DevOps/MLOps in general) like automation, monitoring, testing, and CI/CD. For example: no (or little) automation, doing things manually, no pipelines, no monitoring.
Aside from those, you have other red-flags like over-engineering. Fast, iterative processes are far better than waiting 3 months to design the perfect thing

Alfredo Deza

There is always room for improvement. I keep hearing people say “what if everything is already automated?” - well… there is always stuff to automate and improve. You are asking a critical question here, and not asking critical questions (see critical thinking section at the beginning of the book) is a tremendous problem.

Maja

I will read it as soon as I get the book. Thank you Alfredo Deza so much for your guidance!

Livsha Klingman

Alfredo Deza Noah Gift I’m a REAL beginner, but majorly interested and so far got a good repertoire of success in a few beginning projects (maybe beginners’ luck)!
Your books are all touching on my work topics and what I am facing daily and now you have exposed them for me to read up on!
As I develop, slowly, my knowledge and experience, I am discovering how much breaking into the ‘big’ world is an upward struggle between big enterprises and the well-experienced. (As in any professional field!).
What is the correct priority considering the limited manpower for startups and small businesses - veer towards automation or not? Develop pipelines or CI/CD? or using a service tool and focusing on the ML?
Do you have any advice for ‘us’ small businesses to ‘make a dent’ in the big world and gain the skills and experience to be aware of and make the educated decision of tools, methodology and topology, correctly balancing labor, to successfully develop MLOps?

Alfredo Deza

Automation is not a one time thing that takes months to achieve and is super expensive. Noah Gift taught me the right path years ago: pick any one thing you do manually and automate it by the end of the week. Rinse and repeat, and suddenly a few months later you have several things automated. It is now CHEAPER to run operations because of it and the team can concentrate in even better automation

Alfredo Deza

Always automate

Alfredo Deza

Leveraging the cloud for automation (CI/CD or pipelines doesn’t matter) is good. Leveraging anything that is already solved that is not a core competency of your business is crucial

Livsha Klingman

Alfredo Deza Thanks for your response! Taking this opportunity further… How do you suggest trying to circumvent issues in MLOps, with compounding model decays through either data discrepancy between CI and CD or training and pipeline data, or models based on a initial wrong hypothesis - collecting biased data, which then exacerbates over time growing in bias?

Alfredo Deza

This is a difficult question to get a straight answer. I don’t think there is a one-size-fits-all problem solver here. If you have biased data, but you have automation, tests, pipelines, etc… you still have a biased model in the end. MLOps can’t solve biased data. There is always the human element in all of this, and critical thinking (see critical thinking section at the beginning of the book) is essential

Livsha Klingman

Alfredo Deza Thank you for your advice.. Can’t wait to read your book and thanks for all your valuable time!

Shankar Somayajula

Alfredo Deza Thanks for taking questions. I like the focus on Automation in your book and answers to questions here.
Can the process of Automation involve an abstraction of the data structures as a data model (schema/objects) so that the artifacts of automation are reusable from one project to another.. facilitating more reuse, making the process of automation more of a Product/Platform service instead of a Project/Task output? How does one facilitate reuse (otherwise) - publishing an API?

Alfredo Deza

Reusability is the gold standard. Not entirely sure how to abstract data structures, but sharing/reusing artifacts sounds great to me. As to how to do this, well it depends! Perhaps an S3 bucket would suffice if everything is behind AWS. If you need external access, it sounds like an HTTP API is the way to go

Tony Gunawan

Hi, Noah Gift and Alfredo Deza. Thank you for being here to answer the questions. I am newbie in the MLOps field as I am a data engineer right now on financial institutional field with previous experience as ETL developer and hope my questions is not out of context. Is it possible to fully automate all the process of ML end to end, especially in model evaluation? So many data with unpredictable behavior (like in the financial case) that make a model that has been deployed obsolete like during the start of the pandemic, behavior of the people who need to borrow the money from banks or other institutional lenders have gradually changed and need to do some remodeling with new set of data behavior if I would say. In this case, what kind of things that MLOps need to consider when facing this kind of unpredictable phenomena that will happen in the future? Thank you.

Alfredo Deza

There is no silver bullet here where everything can be fully automated. You’ve mentioned one of the caveats which is unpredictable behavior. Human interaction+evaluation has to be possible. Pipelines have to be flexible. Any automation/workflow has to easily allow for changes and updates. When automating, you must think about the pitfalls and how to address them. For example, you have a pipeline that normalizes data in small amounts, what can you do today that will allow batching the normalizing if the data is gigantic?

Alfredo Deza

alfredinsky

Tim Becker

Hi Noah Gift and Alfredo Deza, thank you for answering all our questions! What would you say are the most useful MLOps skills for a data scientist? For example, if I as a data scientist want to increase the collaboration with a MLOps specialist or if I am working for a small company that does not have a dedicated MLOps person and I as a data scientist have to cover the topic as well as possible.

Alfredo Deza

if you are starting out then I would pick automation. Anything you can do to start automation is going to be super useful and empowering

Tim Becker

Do you have a good idea for a toy project that I could work on to learn more about MLOps? Do you use an example project in your book?

Alfredo Deza

The book uses a public Github repository that you can use to see examples GitHub - paiml/practical-mlops-book: [Book-2021] Practical MLOps O’Reilly Book

Noah Gift

cookbook in particular is a good recipe

Tim Becker

thank you guys 🙂

Luke Garcia

Hi Noah Gift Alfredo Deza, I’m new to DS and MLOps. Does the book mention Kedro? What role (if any) does Kedro have in MLOps?

Alfredo Deza

We don’t have anything related to Kedro (sorry, not sure what that is)

Luke Garcia

thank you

Noah Gift

If you want a deep dive on the book and how to MLOPs from Zero, watch this 2.5 hour video: https://www.youtube.com/watch?v=OMv3lkB5W20

To take part in the book of the week event:

  • Register in our Slack
  • Join the #book-of-the-week channel
  • Ask as many questions as you'd like
  • The book authors answer questions from Monday till Thursday
  • On Friday, the authors decide who wins free copies of their book

To see other books, check the the book of the week page.

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.