Data Engineering Zoomcamp: Free Data Engineering course. Register here!

DataTalks.Club

Introducing Data Science in Startups

Season 5, episode 4 of the DataTalks.Club podcast with Marianna Diachuk

Did you like this episode? Check other episodes of the podcast, and register for new events.

Transcript

The transcripts are edited for clarity, sometimes with AI. If you notice any incorrect information, let us know.

Marianna’s background

Alexey: We have a special guest today, Marianna. Marianna is a data scientist at Restream, and the data science lead and mentor in the local branch of Women Who Code community in Kiev. Before Restream, she worked at Data Robot and she also led the data science team in a fintech startup. Welcome. Before we go into our main topic, let's start with your background. Can you tell us about your career journey so far? (2.0)

Marianna: I wouldn't say that it was super straightforward. This happens for some data scientists, because they usually have just two ways, either coming from a research background or from a software development background. I used to be a Java developer and I just got interested in the field. Thanks to the Women Who Code community I also met real data scientists who were doing work in production. Five years ago, it wasn't that popular, so it wasn't that easy to find such people. The community really helped me to find a mentor. That's how I started studying a lot. Then I got my first job in a startup. I will probably be referencing that today, because this is basically the place where I got the experience of leading a team. After that, I worked in Data Robot, which allowed me to compare what it's like to work in a small startup in comparison to working in a company that’s closer to a large enterprise. Right now I’m in Restream, which is also a startup. All the experience that I got over the years, I will try to apply here, right now. (35.0)

Alexey: Okay, so when did you switch to data science? Was it five years ago? (2:12)

Marianna: Yeah. (2:17)

Alexey: This is when you joined the Women Who Code branch? (2:18)

Marianna: Yeah. (2:23)

Alexey: It was at the same time? (2:25)

Marianna: Yeah, pretty much. (2:26)

Alexey: Were you the first data scientist at your first startup? (2:28)

Marianna: No, there were several other data scientists there. It was a fintech startup and I was lucky with that because now I have more experience and have reviews and stuff like that. But unfortunately, it didn’t last long. A lot of things changed in the company. The careers and lives of my former colleagues also changed. I think after less than a year, I was the only data scientist. That's how I got into trying to organize everything myself and tried to hire a team, which was interesting since I didn't really have much technical experience back then. But I was the only person who knew the product quite well. Even if they hired someone else, this person wouldn’t know that much about the company and its product. It was really challenging, but it gave me quite a lot in terms of experience. (2:35)

Being the only data scientist

Alexey: What are the pros and cons of starting as a first data scientist with a company? (3:42)

Marianna: I would say it's a bit subjective. But maybe the best thing is having freedom in terms of what you can do. You don't have limitations – you're not limited by the technologies that you use, you're not limited by methodologies. For example, in Data Robot, there are really strict guidelines and processes that are in place for the projects. Compared to startups where you can basically come up with some idea that we can implement really fast and control execution. So it doesn't really take you that much to go through all that, as it happens inside of big companies when you need to get all the needed approvals and stuff. (3:48)

Marianna: Also, you have more influence – people listen to your opinion more. If you're working in a big company, there are a lot of data scientists and you're probably working on some specific projects. But when you're the only one, you're basically responsible for everything related to data science in the company until they hire a bigger team. It also seems to me that startups have a healthy work culture – deadlines can be more realistic because you kind of set them yourself. Of course, you discuss it with your colleagues, but still. It's not like you have a strict process and you're not pressured in terms of that.

Marianna: I think that in terms of disadvantages, you have more responsibility when you’re alone than when working in a team. You're not just given responsibility for your model, but for everything that’s happening in the company. In the first startup where I worked, I was basically responsible for everything related to analytics. However, if it was dashboards some various other teams helped me. It's quite a huge role, I would say. Because of that, you had to learn a lot on the way. Again, you don't have anyone near you, where you can just ask. I would say that that's both good and bad.

Marianna: It’s bad in the way that you have this pressure that you need to learn a lot and you have to somehow validate yourself – whether you’re doing the right thing or not. At the same time, if you have someone to ask, you can get unmotivated to learn. At least that’s how it works for me. I developed faster as a professional than when I have less support from my colleagues. I don't know how it works, but sometimes you kind of feel that when you have someone to ask something, you start relying more on that than learning stuff by yourself.

Alexey: Basically, it forces you to learn more than you would otherwise learn with your colleagues. That’s interesting. I actually observed something similar when I worked at a startup. Even though I had colleagues who worked as data scientists, I think when there's a small team, compared to a bigger team, it kind of forces you to really do a lot of stuff that otherwise you maybe wouldn't do. (7:13)

Marianna: You’re just focusing on just one specific project and you think, “Okay, I don't know this. I’ll just ask the person how to do that because I need to do that faster with some plans for the team and deadlines and stuff.” As occasionally happens, sometimes we just never get to learn that ourselves. (7:48)

What should already be in the company

Alexey: Say somebody wants to join a company that doesn't have any data scientists yet and they want to join as the only data scientist. What kind of things should companies have for you to feel comfortable in joining this startup? Are there any prerequisites or can you just go to any startup and start working as a data scientist? (8:13)

Marianna: I would say that it would be great if the company had some kind of existing pipelines and infrastructure, as well as people who are supporting them – either data engineers or DevOps developers, depending on the company’s policies. I've had experience with that. In Restream, there is already established infrastructure and people can do analytics. In comparison to that, when I first started working there was basically nothing. There was some data gathering, but sometimes it was bad data and you couldn’t use it. Because of that, it was really complicated to work. It may have seemed like you had a lot of data, but you actually didn't, because you couldn’t use it all. That's why I would say that it's important to have something that exists already. Also, it’s probably important to have some kind of analytics department. As I mentioned, in my first job in a startup, there was no one doing analytics before me. When you have a huge scope of stuff you can do with data science, plus you have a lot of stuff that you can do with analytics – helping different teams and stuff. But it's too much for one person, for sure. Usually, all of these things are of high priority. It’s going to be hard to do all that at the same time. You can’t. (8:42)

Alexey: Basically, companies should ideally have data engineers and data pipelines. They should also have analysts. Otherwise, you as a data scientist might end up doing all this work in addition to trying to do data science. But without having the data, you probably cannot really do data science. So you will first need to spend a lot of time on building all these things. Right? (10:25)

Marianna: Yeah. (10:52)

How much experience do you need

Alexey: How much experience would I need to have in order to join a company as the only data scientist? Do I need to be very experienced, like a senior person? Or if I'm just switching careers, would it be a good idea to join such a company? (10:53)

Marianna: I think it should be more like middle-senior level. The most important thing, at least for me, is being able to create a project from scratch and go through all the needed phases. Not just for model training, but you should be able to define the problem. If you don't have enough data, you should be able to somehow translate what other people mean, when you talk with people who don't really know data science. Also, you probably should be able to deploy a model and monitor it. Basically, it's a huge scope of work, I would say. Not everyone has this kind of experience because often they just get to work on one specific stage. You need experience with that, or at least the motivation to do that, and the motivation to do a lot by yourself. It's something that a data scientist should have if he or she wants to be the only one in the company. (11:15)

Alexey: Even though there are data engineers and even though there are analysts already in the company, you still need to do a lot of communication – working with others to define a problem – and then actually training a model. Then after that, you also need to be able to deploy it. So you need to be able to do the whole thing, from the beginning to end. Is that right? (12:14)

Identifying problems

Marianna: Yeah, I would say it always starts from defining if there is actually a problem that needs solving. People usually don't really exist in this context of the data science problems and they have a very different understanding of how the field works, and what we can do with the tools that we have. Basically, it's always about identifying whether there is a model needed. Maybe it shouldn't even be worked through, or maybe another problem needs more attention. (12:33)

Alexey: How do you do that? (13:04)

Marianna: I'd say it takes experience. Although I'm an introverted person, it was hard for me to learn not to be like that, it really takes a lot of communication. Especially with people who are close to the business side of the product. For example, if it’s the analytics department, it's really helpful because analysts usually know quite a lot about different parts of the product. They also already help different teams, so they know more about that company. But usually, I try to communicate with all the departments. Sometimes, especially when you don't have data scientist colleagues or you have different people with different backgrounds, I really listen to their experiences and their challenges, and I kind of automatically just translate it into the language of data science and try to understand how their problems can be solved. (13:09)

Alexey: How does it usually happen? Do others come to you with their problems or you need to practically find problems that can be solved with data science? (14:14)

Marianna: I would say that it depends on the company. Usually, when the first data scientist is hired there is already some understanding that he or she is supposed to solve some problem. Sometimes it’s more like, “We have a lot of data. Maybe we should do something about it but we don't know if we should.” So it depends on the company. When I started at Restream, for example, I had more specific requests and I had several ideas to explore, which were “What has more priority? What is more feasible?” That's basically how I started working. At first, it's important to talk more about what you can do, because people can't just come to you if they don't know what you're doing and how you can be helpful. (14:25)

Marianna: First, practice reaching out to other people. That's what basically I did – organized different meetings with all the departments and talked about their challenges and problems and tried to explain how I can be helpful. Now it's more like people come to me instead of me trying to find something to solve. But I also try to always think about some kind of roadmap or plan and discuss it with other people and see if they think that would be helpful or not. I think it's also important.

Prioritization

Alexey: I can imagine if you talk to many people from different departments and from different teams, maybe half of them or all of them have problems. Then you have a problem that you have too many problems to solve. So that's why you need the roadmap, right? You need to be able to identify, “Okay, out of these 10 problems, what is the most impactful one?” Or “What is the easiest to solve?” Right? (16:01)

Marianna: Yeah. Feasibility, priority – you also need to discuss these things. The people who are my supervisors, what do they think? That's how I basically decide what is more important. (16:30)

Alexey: So you just ask “Hey, on a scale from one to ten, how important is this?” How do you understand what is more important and what is less? (16:42)

Marianna: It's more like I discuss it. For example, I’m basically a part of the business analytics department and I just ask the lead of this department what they think is more important right now. He's always in touch with all these people and has been working in the company for many years already. I can chat with him or with someone else, like the head of product and people like that, and hear different opinions. That's how I decide that something has more priority right now. (16:54)

Alexey: I am not sure I got the part about you. When you joined Restream, did you already know what kind of problems you would be working on? Or did you have to figure that out on the job? (17:32)

Marianna: I already had a vague understanding of what it could be. At first, I thought that it could be a lot like one thing, but then they switched over to another. So there’s also work for marketing right now. Also, some of the problems we initially discussed in the beginning turned out to be not so important right now – they moved on to something else. I would say that they had an initial request when I joined the team, but it got clearer over time. (17:46)

What should the company already know?

Alexey: So basically, a company should already have an idea of what they want. They should have some problem that potentially can be solved with data science and not like, “Okay, we have these 10 terabytes of data or ten petabytes of data. Now, we need to figure out what to do with all this information.” There should be something more or less already there – not super specific, but at least some idea. “Okay, look at this problem. This can be solved with data science.” Right? (18:27)

Marianna: Yeah, I also think that even if a company doesn't really know what they want to solve, they can have some kind of consulting done before that. Because it's not like they all know about this complex data analysis, and they don’t necessarily know what they can do with that. It's completely fine not to know, but both hiring and not knowing and also expecting a lot from the person isn’t a good idea. I’ve run into companies that see data science as kind of a magic pill that will be able to solve everything that's currently happening in the company. However, it's more a part of one big process and it changes. (18:54)

Alexey: Basically, the ideal situation is when a company already had consultants who explained to them what could be done, what could not be done. That potentially for this problem, data science is a good solution and they should hire data scientists. This is probably a good situation because the company already has some processes in place. They have some ideas – even though they may be vague ideas – of what could be done with data science. That's probably the ideal case, right? If they already have some data pipelines and an analytical department, that’s probably the best place for the only data scientists to start. Is that right? (19:40)

Marianna: Yeah. I also think that data science is not the first step – it’s not the first thing that you should do. That's why I said that an analytics department is necessary to have before that. Usually, people think that they need some automated pipelines or models and stuff like that, but sometimes they just need some dashboards and more simple analytics. So data science is more advanced stuff that you get over time. That's why it matters to have this in place beforehand. (20:21)

First week

Alexey: Yeah. Let's say you joined a company. You had an interview with them and you really like them. They have data pipelines, they have an analytical department, and they seem to know what they want. You think it's cool and you join this company. So what usually happens after that? Let's say – on your first day? (21:07)

Marianna: I would say the first thing that I usually do is try to discuss as much as possible with different people. What do they expect from me? How do they see my work? Basically, I try to identify the requirements and the main ideas. Usually, the first week for me is exploring the data. Just saying, “Okay, let's go look at the data,” isn’t really productive. It's always better to have some problem in mind so that you can look at the data and think “I can use this for that.” It was useful for me to know about the ideas, which were requested to investigate. So I look at data and think about what I could use, etc. (21:31)

First month

Alexey: That's the first week, you said. Let’s say, you’re there for a month already – what should you do in the first month? Do you already need to have some sort of POC or are you still in the exploratory phase? I guess it depends on the case, but what do you think should happen after a month? (22:25)

Marianna: I think it kind of depends on the person themselves. I personally tend to do a lot of stuff quickly. I try to get some results in the first month. It doesn’t necessarily have to be a model, but some kind of research or insights that can already be used. Basically, you're building a draft version of a model, even if it’s at the same time because you can try to do that iteratively. You already can test out some hypotheses. (22:49)

Alexey: At the end of week one, you already have explored the data, you already know the data well, more or less. You try to assess if you can use the data that is available to solve the problem that you were hired to solve. If that's the case, you explore the data more and get some insights. Then at the end of month number one, you already have some sort of insights or even a first model. Right? Then what happens after a quarter? (23:29)

First quarter

Marianna: I think by that time, you should already have some kind of methodology that you use. For example, I don't just build the first model, but I also build some kind of pipelines around that, so it will be easier for me later. In case I need to retrain it or deploy another model. That's why it will take me longer at the beginning than later, because I already built some preparations for other work that is going to be done in the future. Probably you should already have some kind of analysis of what you've done. For example, you can at least start running the model, do some kind of A/B testing to know if it's accurate enough or how it’s working. There’s a lot based on the feedback on how your model performs versus the processes and the product itself. At least in Restream, we follow LEAN methodologies – it's really important to do a lot of stuff in iterations on a smaller scale. (24:07)

Alexey: In a quarter – after three months – you can already test multiple hypotheses, multiple models, and maybe come up with one that really works and then try to deploy it, right? (25:26)

Marianna: Yeah. (25:39)

Managing expectations

Alexey: Plus, you should have an A/B test and evaluate it. That's cool. What if things don't work out well? I think that’s maybe a happy path, where you identify the problem, you see that all the data is there, you get insights, you deploy your model, then you run an A/B test and you see a huge uplift. But that doesn't always happen. Right? Do you know what the best way of managing the expectations of a company is? Maybe they want to see fantastic results in three months? (25:40)

Marianna: Yeah, I always say that data science is not a result-oriented field. I personally believe that any project in data science is basically an answer to some question. Even if you really don't like the answer, it's already some kind of information that we can use later. For example, when we know that a particular model doesn't work, it doesn't mean that everything is not working completely. We know that we should solve this problem differently, or we should use other data. We can already draw some conclusions from the failed model. That's why I always try to kind of educate everyone that that's how it works. For example, in Restream right now, my data analysts and colleagues agree with that notion, because they have the same mindset when they work with various problems. (26:20)

Alexey: Is managing expectations something you should do before starting at the company? To make sure that the company knows that this process is not deterministic? You're not guaranteed to have success in three months? How would you approach that? (27:17)

Marianna: I would say that I actually do that all the time. Even if you talk about expectations during the interviews in the beginning, it still can be hard if it’s an IT company that’s used to having a developers team, which implements some specific tasks and it's a deterministic way of doing things. I usually just constantly talk about that. (27:35)

Solving problems without ML

Alexey: Okay, yeah. Interesting. I think we talked a bit about the fact that sometimes maybe a dashboard is what the company needs and not a machine learning model solution. How can we understand if a problem actually requires machine learning or that it can be solved with just simple analytics or a couple of queries put in a dashboard? What's the process of figuring this out? (28:07)

Marianna: I would say it's always better to start from something easier. For example, I always start from some kind of exploratory data analysis and I can already see at this point if I need a model for that or if I can do it differently somehow. It's more about not being specifically focused on just building models, and instead being more focused on solving problems and how that can be done. Basically, try to explore and research more in the beginning, rather than just trying to get all the data you can and quickly engineering some features and start training models. Because this is not the main idea. I also think iterations help in that part, in terms of the data science process. You try to at least split the work into some kind of steps and look every time if it works or not. For example, I built many prototypes for models over time, because it's easier to see if some features are working. This way, I won’t spend too much time developing something that's not going to be useful later. (28:35)

Alexey: I don't know if you can talk about specific projects, but maybe you have an example where you did something super simple? Something that didn't require machine learning and then you improved it and did a few iterations and then added some machine learning on top of that later. (29:54)

Marianna: I’m thinking about some of the projects I had. I don't remember any projects where I added machine learning after that, but in the future, if more data is added there, perhaps it’s a possibility. For example, there was a case where there was the possibility to communicate to different customers in more effective ways. At first, I tried to identify which of these customers can churn and which are the risky ones in terms of churn. At first, it was just an analysis of things that can be related to that, and then I built a model which helps to identify the probability of churn. (30:11)

Marianna: After that, I collaborated with the marketing department about trying to communicate with them. Probably later, we can build a model on top of that in order to identify which is the most effective way to communicate and which activities can be used for customer retention. But right now, it's partially machine learning and partially just analysis. I'd say, it's always kind of done that way – you try to do something which is easier and faster in the beginning so that you know if it's a very high priority and whether it's really needed for this kind of problem. Sometimes, some people can say that they need something for the solution right now, but it may turn out that it's not really that needed.

Alexey: I guess when it comes to churn, you can just analyze the churn rate in different segments? You can maybe look at people for a specific type of business or people from specific countries. Then just by doing ‘group by’ and looking at the average churn rate, you can already understand which segments are more likely to churn, right? Doesn't it work like that? Or is there another way? (31:57)

Marianna: I'd say that at first it was more trying to understand it from the point of a product, how it usually happens and what influences this process the most? I had some kind of event triggers at first and just information about that. Then the model itself. So yes, at first it was more of a basic analysis regarding what can be related to churn. (32:26)

Project timelines

Alexey: I think we already talked about the timeline. For the first project, in three months, you have a POC, ideally. Right? And for the second project, should it be faster? Or do you expect to have it in the same timeline? How does it usually work? (32:54)

Marianna: I would say it depends on the case. Sometimes it can be some project which requires working with data you didn't work with before, or require some additional work with the pipelines and stuff. But if it's something that can reuse the previous work, then I usually expect it to be faster. (33:17)

Alexey: So, if it's not a very complex problem and you already know the data set, each next model should be a bit faster. Right? (33:43)

Marianna: Yeah. Especially if you're going to reuse some of the pipelines. I usually try to automate all this stuff. That's kind of my hobby – automating a lot of stuff – because it's kind of boring to repeat yourself all the time with the features and analysis. It gets faster with different projects to get them done. (33:55)

Finding the best solution

Alexey: We have a question about identifying the best model for a problem. Or maybe not just a model. I think it's interesting because we talked about the fact that sometimes there are things that can be solved with analytics. How would you suggest finding the best solution to a problem? (34:23)

Marianna: I would say that you can analyze both from the perspective of a data scientist, using specific metrics depending on what kind of case you have. But it's also maybe even more important to try to understand what you're actually solving and how you can measure that. Something that I always focus on while planning is asking people to give me a request when they need something to be done. I always ask them, “How are we going to measure that?” and “How are we going to know if it's working or not?” Even if they don't always know, we can figure it out together. It's something that should be done in the beginning, because usually when you just come up with something on the way, you never know the correlation with the results. So it's better to define it earlier and then to try to compare your solutions using both the data science methods and also your relational domain. (34:41)

Evaluating performance

Alexey: Another question we have is, “How do you evaluate your own performance?” I think you kind of answered that. When you define these KPIs, when you define these metrics, you can already see if your models are working. This is a good way of evaluating your performance. How well you're doing and how well your models are doing. (35:49)

Marianna: It's a tricky one for data scientists. In the first startup where I worked, there was a discussion about KPIs and how they can be calculated. For example, the number of deployed models or a number of trained models can be a KPI and can usually be helpful. I personally can’t say that I found an answer for myself. But it's probably more about how you deliver the insights. But again, I'm not sure it's related to speed because people have different pacing when they work. Some people require more time, some people require less. It's more how you're oriented on what you do and how you try to provide insights, even if it's not the answer that you were expecting. A failed model is not a total failure. (36:14)

Alexey: Right. I mentioned that if your KPI is the number of models running in production, then what if – since machine learning is not deterministic – your projects are not successful simply because of bad luck? It happens. It's not that you're working well, it’s because stuff happens. So maybe the number of experiments that you did could also be a good KPI perhaps? Maybe many of them failed, but that's part of learning. Who actually evaluates your performance? I imagine if you're the only data scientist in the company, other people might not know how well you're doing your job? So who evaluates you and how do they do this? What do they use to understand if you're doing well or not? (37:13)

Marianna: Right now, it's the business analytics department. Maybe because he's also a data analyst, it helps to be on the same wave and understand that experiments are experiments. Basically, he just has some ideas about what can be done and how fast it's usually done. Maybe it's a bit subjective, but I wouldn't say that it’s a very specific system of KPIs, but that's how it's done right now. (38:14)

Alexey: So having analysts in a company definitely helps because these are the people who you can talk to using the same language because they probably understand your work. At least they know that machine learning exists, even if maybe they don't have hands-on experience. But they probably have some idea about what it is, right? (38:45)

Marianna: Yes. This colleague of mine suggested hiring data scientists because he saw problems which couldn't be solved just using data analytics. Yeah, some data analysts tried stuff in machine learning or just know how it works in general. (39:05)

Getting stuck

Alexey: When you are a “lone wolf data scientist” – you're alone in the company – and you get stuck. There is a problem that you cannot solve. Who do you ask for guidance? Who do you ask for help in this situation? (39:25)

Marianna: I think it depends on what you get stuck on. If it's something business-related, then I can discuss it with my colleagues, depending on who I’m making this project for. If it's more data science-related, it’s usually not intimidating for me, I just try to read more about it. I try to communicate with people I know in the field and in the culture, go to different events, and also discuss it within communities that I know. So I don't usually feel like it's a huge problem for me. It used to be one when it was just the beginning of my career. But right now, I don't really feel like it's necessary to have someone by my side. Basically, as data scientists, we all work separately and we can collaborate like software engineers do. (39:44)

Alexey: So you use your network. Perhaps the community that we mentioned at the beginning “Women Who Code” is also where you can ask for help, right? (40:46)

Marianna: Yeah. (40:58)

Communicating with analysts

Alexey: Okay, thanks. Another question we have is, “What are some good ways to communicate information that you gained through your analysis, deploying your model, to the analytics team? Is it better to do so during research or by running things?” (40:59)

Marianna: I think it's about the format if I understood this question correctly? Or is it more about when they have situations and present something? (41:24)

Alexey: It’s more “How do you do this?” Like, “How do you communicate to the analytics team?” Let's say, you run a model and then you want to tell them about the results. How do you do this? (41:36)

Marianna: I usually create some kind of a report and visualization which could be more or less interpretable for them. Even if it's more my specific matrix stuff, I try to explain it as much as I can, so that we’re talking on the same wavelength. Basically, it's always better for me to make a report because it also helps me to understand what I'm creating, what I could miss, or if I did everything that I was thinking about. It can be just a team call or a one-on-one meeting, depending on how many people there are in the team. If you're discussing it with other teams, it helps to have some kind of company-wide discussions or tech talks. This is something that I had experience with quite often. I would say it's helpful to educate everyone else in the company as well. (41:46)

Alexey: Do you have some specific format for your reports? Some templates that you use? (42:54)

Marianna: Not really, no. I think I would like to have one, probably. But it changes over time and in different companies, it depends on how you present to other people. Over time, you have to get to know people and learn what is more insightful for them. I didn't have any automatic reports. I just do everything in code and try to present them with some kind of visualizations. (43:01)

Alexey: So usually it contains some visualization, right? For example, you could take an A/B test and say “These are the results with the model and this is the result without. This is the metric we were measuring and this is the uplift.” Right? Or “There is no uplift and our experiment is bad.” This is the kind of language you use to communicate the results? (43:32)

Marianna: Yeah. (43:54)

Transitioning from engineering to data science

Alexey: Okay. Another question is, “Your background is that you were transitioning from software engineering to data science. Do you have any advice for people who want to do the same?” (43:56)

Marianna: I would say that the most important thing for me was to get used to this non-deterministic mindset. It's something I could discuss a lot with other people who are also trying to switch to data science from software engineering. Usually, a lot of people struggle with that most of all, as well as probably some mathematical background stuff. I'm trying to read about that more and understand that and try some stuff in practice – something that helps. But I also think that it's not a problem to be a software developer and not to be a researcher. Especially if you're focused on working on specific problems in the company or doing some kind of research, but more scientific research – then it's going to be more of your strengths. Because a lot of stuff is not just about training a model, it's about everything else around it, like maintenance, deployment, monitoring. I would say, for me, it's really helpful that I have this background because it helps me to do all this stuff in a more efficient way. (44:14)

Alexey: So basically, if you work as the only data scientist in a company, then it definitely helps to have this background? (45:28)

Marianna: For sure. Because you get to do a lot of stuff from scratch and implement a lot of things in the beginning. (45:38)

Growing the team

Alexey: When is the right moment to start asking for more data scientists? Let's say you join a company as the only data scientist. Then you did a couple of proof of concepts, and some of them really worked. You now have many projects. Some of them are successful. How do you understand that now it's time to ask for help? In other words, now it’s time to get more data scientists on your team? (45:47)

Marianna: Yeah, I personally kind of always feel that. But in terms of the project's load, it depends on what's expected, how many projects have high priority at the same time, and if I can physically do that all myself. Company-wise, I would say if you deliver value, it's easier to get this suggestion from the company itself – whether you need to get more resources and to expand or not. So it's more about your work and trying to deliver some kind of results and also about managing expectations and managing the work that you do. You always need to look at it in terms of pressure and the plans that you have for the future. (46:16)

Alexey: So when you understand that you're falling behind, that you have many projects and all of them are important, and you cannot implement them. Maybe even the management will come to you and say, “Hey, how about getting some help?” (47:08)

Marianna: Yeah. That's basically what happened at the first startup where I worked. At some point, there were only two data scientists except for me, I was the third one. There were several projects that were really important at the time, and they needed work. It takes time for me as well to mentor these people and do some kind of onboarding. But usually, the time that’s put in usually pays off if you need to have some work done simultaneously in different directions. (47:22)

Stopping projects

Alexey: What about stopping a project? Let's say you worked on a project and it looks promising at the beginning. You did a POC and then after some time, you realize that this project is not going anywhere. The results aren't great and you’re spending too much time on it. How do you stop working on this project and perhaps pick a different one? (48:02)

Marianna: I think that this constantly happens. It helps me to discuss it with my team. If I have data analysts in my team, I always discuss priorities with them – what they think is more important. Basically, you need to always ask yourself if you’re doing the most important stuff right now and what can be done faster than anything else. Then you're going to be ready to switch to other things, regardless of whether it’s more or less interesting. (48:27)

Alexey: For me, when I work on a project, I develop an attachment to it. Even though I know maybe it's not as successful as I wanted it to be. But still, when you put some effort into a project, it can be very hard to let it go. Do you have any secrets on how to overcome that? Or advice maybe? (49:06)

Marianna: I just personally don't feel a lot of regrets when I stop a project. I don't know how it works. When I’m working on a project, I'm usually very focused on it. But if I see that it's not working, I just always tell myself that I will come back to it later if it works at all. The main idea for me is not to work on the same project all the time. Maybe I’m just like that. But the main purpose of my work is to deliver some insights and to do that in an effective way. Basically, it's more about reminding yourself about what you're doing and what your purpose is. (49:30)

Questions for the company

Alexey: Let's say you want to join a company that doesn't have data scientists yet. Do you have a list of questions that you want to ask before joining the company? Are there maybe 5-10 questions that you have to ask before making the decision to join them? (50:17)

Marianna: Yeah. I think the first and the most important question for me is asking how the company sees data scientists? What do they expect from me in the company? How do they imagine what I'm gonna be doing in the company? Basically, just hearing the answer says a lot about whether the company's ready for data science and that they don't just think that it's some kind of magic thing that's going to solve everything. From that, you can also see how mature the company is in this regard. (50:38)

Marianna: Also, additional questions would be “What kind of problems do you expect me to solve?” Because, again, that says a lot about how ready the company is and whether they are hiring you for a specific problem or whether they just have a feeling and intuition that something can be done, but they are not sure what yet.

Marianna: Also, I personally like to ask how they see deadlines – when do they expect the results to be delivered? Because it seems to me that a lot of companies that are not really familiar with data science, think in terms of deadlines from software development, which are usually faster than those in data science. Again, it says about the company’s preparation and if the company is ready to accept how the process is going to run with data science.

Marianna: I would also say that it's important to get to know more about their collaboration between teams and to ask about that. Are the teams collaborating together? How often do they do something together? Can one team work on a task and then another team continues it? This is probably the essence of what we do as data scientists. I can just create some model, but if the results of this model are not used later on by other teams, it doesn't really matter that much and won't provide any kind of results. I won't be able to get some evaluation of what I've been doing. It won't be visible whether it was actually worth hiring a data scientist for the problem.

Marianna: I also ask about the future plans of the company – if they're planning to have a team in the future. Just to get an understanding of what kind of problems they have and how they perceive them. I'm interested in expanding stuff and I don’t just come to a company thinking that I will always be working as the only data scientist there for years. There are always problems that require more resources. So for me, it's also interesting to hear how the company perceives that. Do they actually see this as some kind of potential field for growth? Is there something to do in that field for the company?

Alexey: Thank you. You mentioned at the beginning that one of the first things to find out is whether the company is ready for data science. How do you assess readiness? Is it something that we talked about, like having a data pipeline, having analysts, or maybe a department with analysts? Things like that, right? (54:15)

Marianna: Yeah. And also it’s important to hear how they describe what data scientists do, because it helps to understand whether they already have some expectations and if they are close to reality or not. And if they're not close to reality, whether it's going to be a problem or not. (54:34)

Alexey: How do you ask that? “In your opinion, what do data scientists do every day?” You just ask them like that? (54:53)

Marianna: It's more like “How do you imagine the work that we'll be doing to solve this kind of problem?” Or “What kind of problem do you expect me to solve?” I would say it’s a more vague way of asking that. (55:00)

From research to production

Alexey: I managed to restore my computer and I opened Slido with questions. I see that there is a question with four upvotes. I know we ran out of time, but maybe you have a couple more minutes to answer that question. “When would it be best to take a model out of the research environment and integrate it into the live product? And what factors are important to think about when doing that?” (55:18)

Marianna: I would say that it's better at first to test it out, if it's possible, outside of the product. Sometimes, it's not always implemented inside of the product. For example, the churn prediction model that I worked on in Restream, it basically helps one specific team and it doesn't really influence anything related to the product. It's not integrated inside. But sometimes it can be. For example, when I worked in a FinTech startup, there is a lot of work-related to anti-fraud, and credit scoring was one of the key components of clients’ evaluation. So, before integrating all that and running all the clients for this new way of evaluating them, the thing that I did was a kind of A/B test. I just ran it in silent mode and got some kind of responses from the model and tried to evaluate that, instead of just turning it on and seeing how it works. That's a risky thing to do, I would say. If you have these first results, you already know if it's worth it to try to run it fully in production inside your product. Still, even if it’s running, I wouldn't say that it's not good to turn it on for all the clients. It's better to do it in a kind of A/B testing manner. (55:43)

Wrapping up

Alexey: Okay, thank you. Do you have any last words before we finish today? (57:15)

Alexey: Something that is the most important for me personally, when I started introducing data science in companies, is about not being intimidated by challenges and being able to learn fast. Be ready for the fact that you sometimes have to do a lot of work yourself and also educate the people around you. Hopefully, sometimes that's fun – for me at least it is. (57:23)

Alexey: Thanks. How can people find you? (57:52)

Alexey: You can find me on LinkedIn. I'm not really active on Twitter. Also on Facebook as well. (57:56)

Alexey: Okay. Thanks a lot. Thanks for joining us today. Thanks for sharing your experience. I apologize for all the technical difficulties, first with sound and then with my computer. Yeah, it happens in one stream when I go live. Okay. Thanks, everyone for joining us today and for asking questions. Have a great weekend. (58:06)

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.