LLM Zoomcamp: Free LLM engineering course. Register here!

DataTalks.Club

Humans in the Loop

Season 4, episode 6 of the DataTalks.Club podcast with Lina Weichbrodt

Did you like this episode? Check other episodes of the podcast, and register for new events.

Transcript

The transcripts are edited for clarity, sometimes with AI. If you notice any incorrect information, let us know.

Alexey: Today, we will talk about the human aspect in ML Ops. We have a special guest today, Lina. Lina has over nine years of industry experience in developing scalable machine learning models and bringing them into production. She currently works as a machine learning lead engineer in the data science group of the German online bank, DKB. Previously, she worked at Zalando, which is one of the biggest European online fashion retailers, where she worked on a personalization model. It was a real-time deep learning presentation model for more than 32 million users. Now it's more, right? (2:02)

Lina: Yes. It’s constantly growing. Zalando is popular. (3:14)

Alexey: Yeah. Welcome. (3:17)

Lina: Thank you for having me. (3:18)

Lina’s background

Alexey: Before we go into our main topic, let's start with your background. Can you tell us about your career journey as well? (3:21)

Lina: Yes. Originally, I was very entrepreneur-minded and I started to study business. I got hooked onto programming and basically never left. I moved over to computer science. Since then, I've worked a little bit as an architect at Zalando, which was very nice – it’s an international, big, ambitious company with an awesome tech culture. Since a year I've been working at DKB. Basically, have a bit of a business mind, and I also come from web analytics and an online marketing background. I always have a very customer-centric viewpoint on things. I think it’s interesting to marry ideas from customer orientation into engineering. (3:29)

Alexey: That's how you know all this marketing stuff that we talked about. (4:19)

Lina: Yeah, basically. These are sort of my pet projects and interests - how engineering can get inspired by different disciplines. (4:26)

Alexey: Okay. You said you worked as an architect at Zalando, correct? (4:37)

Lina: No, that was before. At Zalando, I worked as a research engineer. (4:43)

What we need to remember when starting a project (checklists)

Alexey: Okay. Today, we're talking about humans in loop and keeping people in the loop. When we start a project, what important things do we need to remember? What is the checklist that we need to go through before we do this? (4:50)

Lina: What I've observed over the last year is that there are no best practices as of yet regarding what makes a good machine learning model. That means it can be useful to apply certain checklists to help your stakeholders a little bit or when you come up with your own ideas. It will help you if you have a framework so that you can see “Does this make a good ML project?” (5:14)

Lina: One thing I found quite useful is to write down the business case and check it with a stakeholder. The stakeholders sometimes use AI and they think, “Ooh, this is cool – a self-learning system that will solve anything.” So they see it as more of an automated human than a mathematical problem-solving engine. What I've found useful is to formalize the business case with them in the form of a user story and make sure I really understand what they want. Sometimes they think in terms of the solution, they say, “Oh, yeah. Make it better.” They don’t really specify how, because they somehow think AI doesn't need a proper business case, but it does. Therefore, I need them to formalize “How do they measure success? What is being optimized? What is the current way of doing it? Which kind of improvement would make it worth doing the project? Are you looking at 10%? Would a 10% improvement make it worth having an AI model there? And did you consider other solutions besides AI models?”

Lina: Also make sure that they don't fall into the trap of hearing about ‘cool new AI technology’ and thinking about where they can use it. For my practice at Zalando people may say something like, “Okay, we want a personalized recommender system that does this and this.” So I say, “Okay, but what problem do you need that tool for?” So it can sometimes turn out like, “Okay, I have a cool new hammer – I will hit any nail with it and the outcome will be amazing!” It's better when they come to you with a business problem. For example, “We have a new in-carousel or a new in-section with a lot of articles. We don't know which articles to present to the user.” Okay, this is easy. With this problem, I can then figure out how to at least title this recommendation solution.

Lina: So basically, I recommend that you have a checklist that you go through to make sure that none of these steps are skipped and that you adhere to a really good business case. Have KPIs and have evaluated alternative solutions to make sure that you're all on the same page.

Lina: The other thing is, sometimes they kind of use you for very experimental ideas, “Ah, do you're the AI team. Can you do a something something prototype?” And this prototype is not actually their core business. Therefore, I would always insist on them having or providing core values. That’s because in the end, if it doesn't work, it's not that bad for them. They will not invest a lot of time into your project. So make sure them have some skin in the game. You want something that is core to their business when you select a project – something that really makes a difference for them if you solve it. In cases where your group works with another part of the company, ideally, you would also ask them, “Will you give me someone from your department?” “Will you give me the capacity for this?” Because if they're not willing to do that, chances are – it's not that important to them. Or you need to go one step higher to get the buy-in. When all is said and done, if you do this (even if it works) then it just works, but they do not have a buy-in or the higher ups in this area don't have a buy-in. Therefore, you will either not be able to successfully bring it into production or you will have no engagement from the stakeholders on the other side when you want to actually run the thing. That’s why you need to have a checklist of all these things – that’s what makes a good project.

Alexey: So the checklist is – I hope I didn't miss anything – first, you need to formalize your project in the form of a business story. So you don't say “I'll just make it better.” You need to formalize success – “How is success measured?” “What kind of improvement are we talking about? Is it a 10% improvement or 5%?” “If we improve it by 5%, is it worth the effort or not?” (9:15)

Lina: Don’t forget – “Do you even need AI?” Some things really do not need the AI to be involved. Sometimes stakeholders cannot understand the difference between something like a personalized pipeline or just a data pipeline which puts some stuff together. To them that's already an AI model. We know difference. So, you need to quantify alternatives. (9:43)

Make sure the problem is formalized and close to the core business

Alexey: You also said, “You need to be specific about the business problem that you're solving.” So it's not just “Hey, do something cool. Here's this idea.” Instead it should answer “What business problem does this solve?” – and the closer the answer is the core business, the better. If it's just some cool thing that some executive read about, then it might not be worth doing. (10:04)

Lina: Sometimes I have colleagues who come up to me and say “Oh, this is so cool. We should do a new algorithm.” But in real life, it rarely gives a successful product. In the end – you cannot explain it, you cannot give it a title, it's hard to find the UI, etc. You get a lot of edge cases if you do not have a good business problem. Let's say “I have new products and I want to rank them in a certain way.” That gives me lots of ideas for how to think of edge cases. But if you have a very unspecific problem, for example, “Personalize the ranking,” then you will have a lot of weird outcomes that are very hard to fix when you want to run it. So the more specific the problem is, the easier it is for you to constrain it – to give it meaning – and have a successful outcome that everybody's happy with. (10:26)

Alexey: If it's formalized, you can clearly visualize it. You can see the UI. Maybe you can even come up with some sort of mock up for the product. When you have this understanding – when you have this visualization, (even if it's in your mind) – then it's easier to imagine all these corner cases, right? (11:17)

Lina: Yes. (11:37)

Alexey: And the last thing you said – you need to get a buy in, which means you need to have somebody from the business team (from the stakeholders) engaged in the project. They need to make somebody available for your questions. Maybe if you have a demo, you want to show it to them. So you need somebody available for that. If they don't give you this person – if there is no point of contact – then maybe they don't really care about this project. (11:38)

Lina: Exactly. (12:06)

Alexey: Did I miss anything? (12:07)

Lina: No, those are the main points. There are more points. There are some cool checklists on the internet that I can recommend to you. But the aforementioned points are some of the most overlooked, that's why I mentioned them. I recommend that you write yourself a personal checklist. (12:09)

Get the buy-in with stakeholders

Alexey: This is before you even start doing anything, right? You have an idea about something cool, you sit down, and you spend some time in front of a Google document or Word document, or maybe just a notepad. You try to write everything down, you share it with your colleagues, with your stakeholders, and you need to get this buy in before you do anything. (12:22)

Lina: I also recommend that you pair with them. In order to really understand the domain and the problem, it's a good idea to spend half a day on that. Usually we're going after important projects, right? There's rarely a low hanging fruit ML solution. Therefore, it's also worth sitting down with them. Then you may find that you had some misunderstandings about the problem, for example. (12:52)

Alexey: Maybe I'm asking a question that you already answered, but the question is, “How do we communicate?” One thing you said is – just go and sit with them for half a day. But I think it's difficult to do that sometimes. Let's say I'm a data scientist – how do I even talk to these business people? They speak a completely different language. They care about things I don't care about. I care about logistic regression, while they care about profit and things like that. (13:18)

Lina: I mean, it depends a little bit on how evolved they are. If you have a very well-functioning business team that knows about user stories and knows the KPIs, you can just tell them, “You don't need to know anything about machine learning. In fact, it's better if you don't. Just describe the problem to me in your terms.” They can prepare a document and then you can sit together and you ask your questions. If it's a stakeholder team that is not as well-educated on user stories or how to write a good business case, you need to go sit with them for longer. Basically, you need to find out the business case together with them. So it depends, I would say, on how mature they are. (13:47)

Building trust with stakeholders

Alexey: What can we do to actually build trust between us? They don’t always trust us from the very beginning. I think in order to have good communication with stakeholders, they need to trust us and we need to trust them. If we speak a different language, then they don't understand us and we don't understand them – we don't have this trust. So how can we actually gain it? (14:37)

Lina: Yes. I also find that quite challenging. There are probably books on how to develop stakeholder reputations. I have a very good book recommendation – if you guys care to know about it – There's Rebels at Work, it's called. It's about leading change from within. It's basically full of ideas on how to convince people – if you have innovative ideas, how to convince other people of that. In general, I found it useful to not talk so much about what I can do for them, but to first understand their domain. (15:07)

Lina: Also maybe help them with some of their data problems that are unrelated to my project. Sometimes they ask me data questions that are unrelated to my project and I make myself available and can be a sort of ‘trusted expert’ to them. That, of course, is only useful if you plan on working with them in the long-term, but that's a pretty good way to gain trust.

Lina: When I first talk to them, I not only focus on the upsides – I also focus a lot on their concerns. When we talk about the business case, I am pretty sure that most of these concerns are actually not valid. They may come to me and say, “Oh, this is gonna be slow.” And I explain, “No, it's not going to be slow. It's going to be really fast.” But maybe they’ve had some bad experiences, so I do not judge their technical knowledge, or lack thereof. Instead, I tell them something like, “Okay, what do you think definitely shouldn't happen when we introduce a very intelligent solution here?” And they talk about all kinds of fears they have like, “Oh, it could be very slow!” Or “Weird things could drop entire cases.”

Lina: We also had concrete concerns come up in these fears. For example, when I was optimizing process costs, they were saying, “But please, it should not reduce our sales. If you reject people because of certain process cost optimization, we do not want the overall sales volume to be rejected.” Then some people who had more of an idea about how algorithms work stepped in and added, “Okay, if it only learns from the past, what happens if we want to change something in the future? Are we then unable to change the logic?” So they're asking questions like this, which I would never address when I pitch the project to them.

Lina: The whole time you're focusing on the upside – you're pitching your cool idea – but they're sitting there and they don't have space to express their concerns. This makes them not want to buy into your solution, because their basic questions have not been answered. We can also take a page from the book of marketing. When you come to a really good website, they do not sell you only on the upsides of a product; they also tell you what it's not. For example “It's good quality” because you may be worried that it could be cheaply made – all that kind of stuff.

Alexey: Or another example – “No credit card required.” When I see that something is free, I think “Okay, but they are probably sneakily going to charge me after the free trial is over.” (18:18)

Lina: Sometimes they say “No extra charges.” Basically it's a similar process. That’s why I'm don’t only ask them about what they want to achieve, but also the constraints, “It should not be that. I'm worried about that.” I make these concerns into slides and for each, I write down what I’m going to do to address it. Then they feel very good about it. They feel that they’re being taken seriously and that I'm not doing something over their heads. All their concerns are addressed and then you move on to the actual solution. (18:29)

Don’t just focus on upsides – ask about concerns

Alexey: Basically, when you pitch ideas, you don't just focus on the upsides. You also ask them, “What kind of fears you have?” (19:02)

Lina: I don’t call it “fears” so that they don't feel condescended to. That's a bit of risk. You want to meet them on their level. You don't want to talk down to them. You're peers – they know their stuff, you know your stuff. You ask them “What are your concerns?” Or “When we develop a solution, what is something you definitely want us to avoid?” More along those lines. (19:13)

Alexey: “What should we avoid?” (19:37)

Lina: Right. You need to ask them things like, “What should we avoid?” “What is your worst case scenario?” “If something goes really wrong, what do you definitely want us to avoid?” That is also very useful for us to know because we find out what their worst fears are. For example, we were developing customized OCR for incoming invoices that are scanned. We're thinking, “Of course that should produce good quality output. The text part, the supplier part – it automatically gets put into SAP.” And we're thinking, “Okay, this should work well, high accuracy, blah, blah, blah.” (19:40)

Lina: And they were like, “You should never lose an invoice.” It didn't even occur to us that we could lose an invoice. But they insisted, “You shouldn't lose an invoice.” Basically, A) We know that they're worried about that. That is helpful information. B) We can make sure we go through each step and say, “Okay, what's our scenario if it fails?” Let's say we fail to read an invoice. “What do we actually do in the OCR? Does it get stuck there? Is there a retry? What if it fails permanently?” So this approach also helps us to think of our user stories. We make this “We should not lose an invoice” into a user story.

Lina: Then we go through the different steps. We maybe even make it into a metric – we have ‘incoming invoices’ and ‘outgoing invoices’ and we set up an alert that takes the difference between those two, which should be zero. This way, we have something to show them and we also make sure that we can proactively avoid their worst case scenario. Because if you do their worst case scenario even once – maybe you don't even realize that you did it – they will call you, “You know this invoice, the customer called – we never paid it. Where is it?” And we can only respond, “Uh. It's been laying there for two months and no one noticed.” I’m making up this scenario, of course, but that’s when you lose trust. Even though it's completely normal to have bugs in any software, as we know, you do not want to lose this trust. By collecting these user stories early on, we can proactively avoid many of these worst case scenarios.

Lina: We even use that for a demo. Let's say that we were worried about losing an invoice. When we demo the solution to them, we can not only say “Look at this amazing accuracy!” But we can also say, “Oh, no! An invoice with a corrupted format! What happens then?” And then we show them “Here, there, and then this is how the fallback works.” It also helps you with design, because you can already think of things that can go wrong. Sometimes you forget to design the bad outcome like, “Does it need manual intervention? Yes or no? Who gets notified?” So this really helps for them to put their trust in you. “Okay, I've seen the worst case scenario for this solution. That would not lead to any invoices missing.”

Alexey: You said a couple of interesting things. First, you turn these concerns into slides. I imagine that maybe, for each concern you have a slide, where you address each one. This lets them know that you are listening to them and you want to address their concerns. That's the first step. But you don't stop with slides. The next thing you can do is take each concern – each fear – and turn it into a metric. With the lost invoices example, you can take the number of invoices coming in and out and if there is a difference, you send an alert. That’s how you turn a concern into a metric. Then you can also demo these concerns. It's not just happy cases, but also you show the worst case scenarios. You take a completely broken invoice and you try to process it through the system and then they see, “Okay, this is how the system behaves.” In the end, they're happy and they trust your system because they’ve seen how the bad things get solved. Is that right? (22:36)

Turning a concert into a metric

Alexey: Do they always need to see the metrics or just a demo enough? What do we do next? (23:47)

Lina: Most business stakeholders basically just want to believe that it works. Once they believe you that it works, they will be fine. I guess it depends on your business stakeholders. Some might want regular reporting. But most of the cases, once you have the trust and you have established the procedures that you will need to set up – for example, who takes care of cases where there’s manual intervention required – that should be enough. (24:02)

Alexey: So you have trust and they believe you, now you need to work on not losing it. Right? (24:28)

Lina: Yes. (24:33)

What happens when something goes wrong?

Alexey: These procedures that you mentioned – if something goes wrong – how do I tell them about this? How do I tell stakeholders about incidents where, let's say, we did lose an invoice. Suppose it was some corner case that we didn't think about beforehand. How do you communicate that to them? (24:34)

Lina: I will generally say – be transparent. But it depends a little on your stakeholders. When I worked at Zalando, the business teams were more evolved (knew more about software). There you can have post mortem reports and you can estimate the impact. When you have internal teams that are mostly used to off-the-shelf software, you need to communicate such things a bit differently. They usually don't want details, they just want to know that you are handling it and when it is (or will be) resolved. (24:53)

Alexey: Basically, you need to keep them in the loop. (25:37)

Lina: Yes, you need to keep them in the loop. You need to find out who's responsible for fixing it. You ideally plan for that beforehand. For example, what I ask my team is, “Okay, what's the impact when we have an incident and we're out for one minute? What about if we're out for 10 minutes? For one hour? For 24 hours?” Let’s say something terrible happens and we need to discuss with a stakeholder what the impact on the business is. At this point, we have an idea about the service level we need to have, as well as about our alerting. “Do we really need to have people on the weekend if that thing is down? Or does no one really care if we have a slight delay fixing it, as long as that nothing is lost?” So you need to think a bit ahead and communicate the service level for the business people. But more in terms of what they understand. For example, “What's the impact on your business if that thing is down?” Everybody should be able to answer that question. (25:40)

Alexey: So you just sit with your stakeholders and say, “Imagine our thing goes down for one hour. How bad is it for whatever process we automate or whatever thing we're doing?” (26:43)

Lina: Exactly. Also think about how it would start up again. “Do you have a queue? Do you have a cache? Does it start running from there on? Or does it catch up somehow?” Basically – think ahead a bit. Then the impact of incidents should be minimal. (26:55)

Post mortem reporting

Alexey: Let's say we agreed with everyone on this, and we say, “Okay, the system should be responsive within one hour. If something happens for 10 minutes, nothing bad happens, but it would come back in one hour.” So you will define all these service level agreements and you start writing something. Then when something bad happens – it goes down and somebody has to fix this on the weekend. What happens after that? How do you communicate that to the stakeholders? Is there any special framework we can use for that? (27:14)

Lina: Internally, of course, we run a post mortem for that. As for how you communicate that to the stakeholder, as I said, I think it really depends a bit on the stakeholder. Our current business stakeholders do not care about the post mortems. We make them for ourselves. It depends on your work environment. However, I found that the post mortems for ML are a little bit different than the regular post mortems. That's also an interesting thing to consider – how do we get these ML applications back to work. I noticed that there are definitely some differences when compared to regular incidents that are from a non-ML component. (28:03)

Alexey: You mean when the system is working, but it's not working correctly. Let's say, if we take a credit risk scoring project – somebody applies for a loan. We know that this particular person will be able to pay the loan back, but the system says ‘reject’ without explaining anything. So from the standard operational metrics point of view, the system is running. It’s still up, but it predicts garbage. Is that a correct interpretation? (28:50)

Lina: Yeah. That can happen as well. First, you need to detect that. In the example you mentioned, it’s useful to have a live test set. This is something that’s useful for cases where the model is actually affecting the outcome, like fraud prediction or spread prediction. This is a test where you do not reject people but you have it a ‘small running A/B test with 1% or 2%’ as some people call it. Other people call it a ‘live test set.’ You can use that for detection and then you diagnose. In other words, after you find an outcome that says “Absolutely not!” but the person was a great candidate for a loan – what do you do? (29:23)

Lina: My first message to everybody is – please use the post mortem format for defining your ML solutions. I've seen even very experienced colleagues jumping to complete conclusions based on just some data. Sometimes it’s the same as with our stakeholders, “Oh, this phenomenon probably leads to this and that.” I can give you a very funny example that we had when I was working at Zalando. It’s a nice example for debugging ML algorithms. Sometimes people come to you and say, “What is this? This is a bug. This is not how it should be. What did you do?” Then you get a screenshot or something, or someone sends you an example, and you have to find out what went wrong.

Lina: We had this funny example where a colleague of ours went to site’s homepage and on the men's homepage, he saw a woman’s bag and a woman's shirt. He told me, “This is a very, very bad recommendation. What were you thinking? I am offended. What happened?” And we said “Okay, let's look into this. What happened?” One thing you need to have is some tooling in place to read back your ML algorithm. Maybe you need to lock the features that arrived in order to be able to later check what the input into the model was. It's also very important that you don't jump to conclusions. In the example, he saw a bag and a woman's shirt, and I was like, “Okay, that's very weird.” So the first thing we had to check was “What did we do there?” Interestingly enough, I found out that this was not even a recommendation box. This was a ‘last seen’ box.

Alexey: [laughs] So he actually looked at these items previously, right? (32:08)

Apply the 5 why’s

Lina: I thought that he must have. So let's use the post mortem format to debug this Okay. It's the ‘last seen’ box – some of my colleagues spend some time debugging the problem, not noticing it's not recommendation box. First thing, apply the strategy, “Check, check, check, OK.” Then you find out it's actually not a recommendation box. Then we need to have all the information to debug this, so we checked his history, “Has he actually seen these items?” Turns out, he had not seen these items. That would explain why he thought it was a recommendation when it wasn't, because he was surprised to see these items. He had no recollection of them. If we apply the post mortem format, Okay, next step. “Why was this in his ‘last seen’ box?” “Well, because it in his history.” “Check – It was not in his history.” If it had been in his history, we would have another hypothesis. “Why did he not remember?” Apply the five why’s, “Why did he not remember?” One reason could be that maybe this box keeps his ‘last seen’ actions from half a year ago and he came back and he doesn't remember. Then a product conclusion could be “This box should only be shown for five days.” (32:11)

Lina: This would be a very different way to fix the problem. Then, if you apply the five whys, you come to another conclusion. “Okay, he had not seen these items.” So we had to figure out, “Why was he seeing this in his ‘last seen’ box?” And then we found out that he had a shared account with his wife. His wife had been browsing these items on her app. This ‘last seen’ box had a cool feature – it collected both the desktop and app data together and showed it to him.

Lina: So now we can have different conclusions based on this. A) Should we maybe have gender tome. He is on the men’s section, so maybe the ‘last seen’ box should be made aware of its context and show only male stuff on the male part of the site. Thus he would have only seen men’s stuff. B) We could consider “What if it's a shared account?” Clearly, multiple people browse on this account. “Should the ‘last seen’ box behave differently somehow if we can detect that this is a shared account of multiple family members?” When you think of Netflix – they found that this is a problem, so they split the accounts. “Should something similar be considered?” “Are there other features that need this or just this box?”

Lina: By going through the five why's, you can immediately understand how you might accidentally go in the wrong direction. I had a few examples where some colleagues jumped to conclusions. Because it's an algorithm, you just look at it and you say “It's probably because of this.” They are all engineers, but somehow, sometimes, we don't apply a very structured approach with ML and really make sure to check each step. “Can this be true? Would that make sense? How could I find that out? Maybe I need some tooling around that as well, which I need to build.”

Lina: So I recommend a very structured approach. I recommend that you write yourself some tooling. For example if you need to log the input features to do this kind of debugging, then you should. I also recommend that you get user feedback about bugs. Interestingly, we have a lot of bugs in our ML solutions. Sometimes it's edge cases. Sometimes it's whole groups of people. For example, here it’s users with shared accounts. Sometimes it's people of special sizes to stay in the ecommerce area. But we have that sometimes when we look into bias – we look at how well the algorithm works on different subgroups, like “Does it respect all small subgroups?” Or “Do some of these subgroups have bad experiences?” Basically – get some user feedback, either from clicks or I also recommend that you try to use your own product. We know this approach from software engineering, when you basically ‘eat your own dog food’. You really should use your own service. I did find quite a few bugs when we were using our own service. Also, maybe make a channel, so that internal colleagues can easily report bugs to you, “How would you get this bug support?” “How do you make it known that there’s a bug when you roll something out?” Maybe you can add that to the announcement of the roll out, “All internet colleagues, if you see anything weird, here's the email that we use for bugs.” So try to get this feedback from anywhere you can.

Alexey: There was a funny story a few episodes ago. We had a guest that worked at a telecom company. The company also sold phones and they did some credit scoring. So as he worked there, he applied for an iPhone and got rejected. So then went to his colleagues and asked, “Hey, what's going on?” Then he was able to debug the model. (36:41)

Lina: Oh, excellent! Did he find the root cause? (37:10)

Alexey: Yeah, it turned out that it was because he was on a temporary residence permit. He just moved to the Netherlands. That was the reason. That was the strongest feature in the model. (37:12)

Lina: That’s interesting, because people who just move somewhere probably all need a new phone. Right? Or new phone plan. So that’s an interesting question – is that is rejection-worthy or not? (37:24)

Alexey: But sometimes people who just move somewhere, they may buy a lot of phones and go back to their home country without paying the loan. (37:37)

Lina: It's an interesting question. Maybe you need some other feature to make sure you don't catch these types users. For example, maybe the people that live within the EU and move for work are probably not a problem. I'm not sure. But it's an interesting question that you can then send back to the business person or you find the bug in your own application that you can solve. (37:48)

If a lot of users say it’s a bug – it’s worth investigating

Alexey: Maybe it's not a bug. You just have to live with knowing that you will not get that phone. (38:13)

Lina: For him, it's probably a bug. So I'm of the really strong opinion that – if a lot of users say it's a bug, (maybe not for the case with the phones) there might be an argument for the company to trade off risk. But in general, I've seen a lot of cases where some people say “It’s not a bug – I won’t fix it.” That is not an acceptable answer in my book, if a lot of people are complaining about it. You also will not find out if a lot of people are complaining about it if you don’t go looking for these use cases, which is horrible. I tried to get my product people on board to run a sort of ‘user-oriented debugging.’ Say a user comes with sort of WTF moment – you make this into a post mortem. That's good practice, which helps improve usability and the awesomeness of your product. (38:20)

Alexey: Now I'll have to bleep the video. [laughs] (39:10)

Lina: Oh sorry. I meant “WTF moments”. You should hear me off-video, I'm way worse. (39:14)

Post mortem format

Alexey: I wanted to ask you a bit about this ‘post mortem’ format. We also have a question in chat. What does the format look like? I think one thing that you mentioned is that you need to ask the “five why's” – you don't jump to conclusions immediately. So you need to spend some time trying to understand what the actual root cause is. This framework of “five why's” can help you do that. In other words, you don't stop at the first ‘why’ and use the answer as a conclusion, but instead you keep digging. What else do you need to do in order to have this structured approach in a post-mortem for a data science project? (39:26)

Lina: One thing you may need is more technical information. Sometimes you need the cookies. Sometimes a screenshot is enough. It depends a little bit on your domain. Basically try to find a way to get the necessary information in order to debug the issue in your application. Maybe an interesting hint is – don’t only use incidents, also use these ‘very bad user experiences’ that you wouldn't need, so there is no exception. Nothing is in the logs, so you need user inputs, which are a little bit hard to get. Get the bad experiences – then you can gather the necessary information to debug. Sometimes it may be cookie information, sometimes it's a log in whatever your service gets as relevant inputs. I would say that's it. (40:14)

Lina: Otherwise, it's a typical engineering post mortem format. I'm always borrowing from different disciplines. Sometimes, we, as ML people, can be software engineers, but sometimes we have like, physicists, or economists – and they might not know this format. So for these quips, It's a typical format that software backend engineers use to debug their incidents. We can utilize that – just adapt it a little bit. It's quite useful.

Action points

Alexey: Do you remember what the format is? What does it look like? I think I saw that usually it has some sort of timeframe. You ask “What happened?” but without any finger pointing or blaming – just factual description. (41:48)

Lina: First you get the facts. If it's a backend service, it's likely “The service was down from that time to that time.” As in our women’s bag example, it might be a screenshot, or it might be return values. We put all the factual information together and then we do the investigation where we go step-by-step through everything – “The user saw a blouse and a women’s bag on the frontend home page. Why?” “Okay, this is surprising.” Maybe you go through investigation, and then there's a lower part where you can add details about the different investigation parts, such as logs. Then there's a very important section, which is called ‘action points.’ Normally, post mortems are called ‘blameless post mortem,’ say you say “No one is at fault.” (42:03)

Lina: Then there are action points. These action points mean that you try to make changes to your application that ensure that this kind of unfolding chain of events doesn't happen again. This can be a process change. For example, “We found out how you work – there was no ‘four eyes’ principle implemented.” Or “You did not have good unit test coverage of edge cases for your solution.” Then you make these actual points that focus on that, “I recommend edge case testing,” for example. Or “I recommend a very specific change.” Often it's also very process-oriented – you do not only fix this very specific bug, but you think of the category of bugs, or this type of problem. Is there a way to fix that? Then you write the action points, someone from the team reviews it – it's like the coach change. They give you a review on which action points should be implemented. Then you make them into tickets and actually implement them. This helps with a constant cycle of improvements.

Debugging vs explaining the model

Alexey: Thank you. We have a question. We talked about debugging machine learning problems and figuring out, “Okay, the model made this decision. Why did this happen?” Do you know any off-the-shelf or open source debugging tools for that? (44:11)

Lina: Yes. It depends what you want to do with it. There's model explanation, which is a whole other research area, like sharp values and these kinds of libraries that you can use. The question is usually what you want to achieve with them – explanations or debugging. With debugging, we want to find the root cause of a problem or an error. Usually, it's not to explain the algorithm. There’s no off-the-shelf tool to explain the root cause of a bug or a design error. You really just have to go through the format and you see what you can use to debug your own logic. (44:29)

Lina: What the question is referring to is probably a tool to get explanations for the model, which is something that is often used to explain it to stakeholders or to reason about the internal workings of the model, which is a slightly different purpose. For that, you can Google “Explainable AI” the results will explain much better than I can. There's a bunch of libraries. But for the debugging, it’s really quite different. Usually the mistakes stem from the fact that you didn't consider certain modeling assumptions. So, you actually have to change your model, or you have certain filters in place, or the UI is not correct – so it’s much broader when it comes to debugging. The root causes are often not in the model itself, but from a wrong assumption you made, or you did not consider certain inputs that have to be treated separately. It's seldom the algorithm that’s the problem.

Alexey: Or maybe the data changed. (46:12)

Lina: Or data problems, exactly. That can also be an issue. Yeah. (46:15)

Alexey: For example, in one of the features maybe a unit changed – instead of kilometers, you have meters. (46:19)

Lina: Yes, we actually had that in Zalando. We have a colleague who was working on the fraud model. On the live path, the unit of one of the very important features changed from seconds to milliseconds, but not in the test data. That completely screwed everything up. For that, you don't need model explanations, you need monitoring of the input data distribution and compare them between training data and live data. So, as you see, it depends what exactly we're talking about. I hope that answers your question. (46:28)

Alexey: So there is a variety of things that can go wrong. (46:59)

Lina: Exactly. Can we go back to the person and see if their question was answered? (47:06)

Alexey: BK62 – If you're listening, can you please let us know if your question was answered or not? If you want to add something, please let us know. (47:12)

Are there online versions of checklists?

Alexey: There is another question from the same person about any of the checklists you mentioned. “Are there online versions of these checklists?” Have you seen any of those? (47:20)

Lina: Maybe I should make a blog post. There's one checklist, which is of kind of nice – it's a Hands on ML. They have a checklist and the first chapter I think is also online. Some of the additions to these checklists are actually just from my personal experience, so that part is not online yet. But I have seen different people have different forms of their checklists. You can combine to make your own personal ‘best of’. (47:34)

Alexey: On Monday, we talked about AI Canvas, which is some sort of business Canvas. Maybe you saw this canvas. You have a piece of paper where in the center, you write the business value. On the left, you have data, etc. And you have all these different blocks. Maybe this also acts as a kind of checklist, because you have to fill all these different blocks. And then you can make sure that every aspect is covered. (48:04)

Lina: Yeah, that sounds like a good tool. (48:41)

Alexey: But yeah, I also like to think about that for us engineers. Maybe it's even simpler if you have at least a checklist. Then you just tick, tick, tick. “Okay, here is where I am missing something. Let's go and fix it.” You probably should write a blog about it. (48:45)

Lina: You have to ping me on this. (49:02)

Make sure to log your inputs

Alexey: Yeah. We'll have a transcription of this. Maybe it will be easier then to convert it. So BK 62 said that “Yes. You answered my question. You mentioned writing your own tooling. So I want to see if there's anything already available that can be built on top of.” (49:06)

Lina: Oh, okay. So that question was probably just on the explanation part. Just one addition to my answer – It’s very specific to what your inputs are. Basically, make it observable. Make sure you log your features. Make sure you have some way to find out what the inputs were after the fact – what did your models say? Also, it’s important to know how to connect it all to the necessary debugging system. If you have a feature store, you have to know how to look up what were the features were at the time or log them. Something like that. (49:28)

Alexey: People are saying that everyone's waiting for your blog post on that. (50:06)

Lina: Oh, okay. Yes. If there's interest then I may get into this. Yes. (50:10)

Talking to end users and using your own service

Alexey: We have two more questions. “Do you also talk to end users or just limit research to project managers?” I think we talked about that – you actually talk to end users. (50:19)

Lina: It depends on what project I'm working on. I do talk to end users in some cases. I also do mystery shopping. Mystery shopping is basically when you go through the process yourself. I was optimizing the credit process application in my current job – so I applied for a loan. Just on check 24. I went through the different processes to see what the user experience is – “What kind of values I have to give? What do the other banks do for this process? What’s the flow like? So yes – speak to end users. I also speak to experts about the topic, because sometimes they act like a summary of a bunch of end users. They can also tell you a bit of meta information. So, all of the above. Yes. (50:30)

Alexey: I hope your SCHUFA score wasn't affected when you did this. (51:17)

Lina: That’s one thing I checked, actually. Because I was like, “This should not be affected. They always promised it won’t – and I checked it.” Indeed, when we do requests, it does not get worse. I can confirm that it is not affected. (51:21)

Alexey: For those who are not from Germany – in Germany, there is this core credit score that tells how trustworthy a person is when it comes to credit. I think it's nationwide, right? (51:33)

Lina: Yes, I checked. I checked our database. I also got my free SCHUFA – you can get it once a year for free. They hide it on a sub page because they want to charge you for it. Free life hack of the day, you can get it on the sub page. Once a year, they send you the detailed information for free. You just have to wait and it comes in paper format 14 days later. I checked against that and I also had output of all the other banks. It’s quite interesting. They have different scorecards that they calibrate based on their business case. We get slightly different scores from each bank. So that was quite fascinating to sort of reverse engineer that a bit regarding how this process works. (51:46)

Alexey: That person we talked about that applied for a phone, I think he also did mystery shopping without realizing it. (52:19)

Lina: [laughs] He did. See and he encountered a problem. Yeah. (52:23)

Your ideas vs Stakeholder ideas

Alexey: Then there is a follow up question. It actually was two questions. “Do you get your own ideas when discussing a data project?” (52:29)

Lina: To suggest problems to the stakeholders as projects? Yes, because I'm just observing the space, see what other people are doing, and basically try to go to the stakeholders and say “Do we need this? Do you think this is useful?” It's very hard to generate such project ideas. Sometimes the stakeholders don't know what they don't know. It's like a ‘chicken and egg’ problem. I am looking at what possible applications other people are doing and try to see if we have the same problem. But ideally, the business people should come to you; you should not come up with the problem yourself. Because you're usually wrong. You're just one person. To think that you're the user is usually a wrong assumption. But you can definitely try just seeing “Okay, these cool ML applications are possible. Does that make sense for my company?” Then yes. Maybe the person can clarify the question a little bit, to see if I answered it. (52:39)

Alexey: I think it comes back to one of the points that we discussed about making the problem specific. If you make the problem specific, it's easier to talk to the stakeholder. (53:34)

Lina: Ah. Okay. Okay. Yeah, it's so it can be a collaborative problem. When fleshing out what the problem is – we're actually very involved. I think if you're very hands off, ML really requires that the problem be very well defined. I've seen quite a few projects fail because the ML works, but the problem was not well enough defined. Or it could never work because the problem was not well enough defined. Then in the end, users will blame you or the ML in general. So I think it's really up to us to say “This is the requirement. The requirement is that the business case is well defined.” (53:47)

Should data practitioners educate the team about data?

Alexey: Thanks. We have a question about data knowledge. I'm not sure if we talked about this – maybe a little bit. “Regarding data knowledge within the company – is it data practitioners’ responsibility to educate the team on what data there is and what problems it can solve?” (54:28)

Lina: Wow. The person asking is probably working in such an organization – I can hear the pain behind the question. I feel you. Basically, we need the counterpart to be well-versed in data. But sometimes they're not, so what do you do? Perhaps you only take jobs in companies which are already quite advanced? Usually, we still need to do a little bit of outreach and educate people. A part of my work is, let's call it ‘community building’ in the company and talking to other like-minded people. I try to have a bit of a movement of data people who all want to work in the same direction. Also get business people interested in using data. Of course, try not to spend too much time on it. Unfortunately, with the state of data literacy at the moment – it’s also a somewhat on us to do a little bit of education. Or you can pick a company where this is not a problem, which are few in number, I would say. (54:49)

Alexey: But then you learn how to deal with this, you get experience. Then maybe if you go to a company where the people are less mature in terms of data literacy, then you already have experience – you know how things should look like – you can share this experience in the company, so they can move to that level of maturity. Is that right? (56:06)

People skills and ‘dirty’ hacks

Lina: Yes. In general, I think we always need to have quite good people skills in our job, because it's so cross-functional. The main thing I'm working on during the last few years is not only the technical part, but I try to be better at convincing and motivating. Sometimes, when we start off, we don't necessarily need that or have that as engineers. It's quite useful to invest a little bit of time into this. It’s also useful for data-related problems. I also have some dirty-tech tricks. If you work with dirty data, I used to have techniques where I tried to convince people to fix data so that I can use it. But that never worked, because the data is not used – it's not attached to a big business case. Now, I have a set of dirty hacks that I just apply. For example, I start using dirty data, which mostly works, but not always. Then I say “I'm using this data, please fix this!” (56:28)

Lina: So there's a bunch of things that you also acquire on the way. It takes a bit of convincing, but also a bit of dirty tricks that you would just bring with you.

Alexey: Just to make sure I understand the dirty trick, (so that I can use it later). Say there is a data source that is not claimed, and the ones who produce it don't want to clean it, because nobody uses it. They say “We don't want to spend our time because we don't see any impact.” So you start using it and then you come to them and say, “Hey, you see – I actually use it. How about you clean it now?” (57:38)

Lina: “It’s a nice product, it mostly works. But look at this very unfortunate side effect in 10% of the cases.” Yes. Unfortunately, I used to not do that. I used to speak a lot and blah, blah, blah – nothing happens. But that was the way that got it done. But also be careful with this. In general, I'm also careful with it. For example, every new data source I add, there's a cost to it. There's a cost of maintaining the data source – there’s a cost of having a new stakeholder monitoring it. In general, I apply the strategy where each data source, or each feature, needs to prove itself in order to be added – especially new data sources. But every once in a while, you really want it because it's useful, like some sort of event that is very good. So, just mix and match your approach. (58:02)

Alexey: Maybe we should have a follow-up conversation about ‘dirty hacks’. (58:49)

Lina: I cannot tell you. I would have to you know… dispose of you afterwards. [laughs] (58:57)

Alexey: [laughs] I can imagine a title, like ‘dirty communication hacks’ (58:59)

Lina: ‘With Lina’ (59:07)

Alexey: “Don't try it on your colleagues.” (59:08)

Lina: Then we have 30 people tuning in after 11 at night expecting different contents. (59:11)

Alexey: So. Do you have any last words before we finish? (59:17)

Where to find Lina

Lina: Thank you for having me. And if anyone wants to connect more – I'm hanging out in the MLOps channel sometimes. Also on LinkedIn. Or if anyone wants to write a blog post together or just generally share? Yeah. Look me up. (59:26)

Alexey: Okay, great. My next question was “How can people find you?” and you just answered that. I think I ran out of questions. I just want to thank you in return for joining us today, sharing your knowledge, your checklist, your one dirty hack. Maybe we will talk about other, but I think that the one you gave was already useful. Thanks everyone who joined, listened to our conversation, and asked questions. Don’t forget, we have two more talks. Tune in tomorrow and on Friday. That’s all. Thanks a lot, Lina. (59:42)

Lina: See you. (1:00:18)

Alexey: Goodbye. Have a great day. (1:00:20)

Lina: You too. (1:00:21)

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.