MLOps Zoomcamp: Free MLOps course. Register here!


DataTalks.Club Behind the Scenes

Season 7, episode 1 of the DataTalks.Club podcast with Eugene Yan, Alexey Grigorev

Did you like this episode? Check other episodes of the podcast, and register for new events.


Eugene: Welcome to another session of the DataTalks.Club podcast. It's a bit unusual today. Well, Alexey is here and I'm here, but instead of Alexey hosting the podcast – this time, I'm going to be hosting it and I'm going to be grilling him with questions. So a bit about Alexey and I – we've known each other for about…How long has it been, Alexey? A year or two? (9.0)

Alexey: Something between that. (30.0)

Alexey’s background

Eugene: Yeah, we met a long time ago, I can't remember how. But we started chatting. Since then, we've been chatting every other month. Through that, I've seen how Alexey has come to the crazy idea of wanting to start his community. I thought it was crazy. I was like, “Well, it's a lot of work.” And I've seen him really grow it to the amazing community it is today. I've always had a lot of questions for him. Today, I'm going to be taking my chance. But first, let us try to learn more about Alexey, the person he is at work. So Alexey, in 2010, you graduated with a degree in Information Technology and became a software engineer mostly focused on Java. Fast forward more than 10 years later – you're now a principal data scientist at OLX. How did that happen? (33.0)

Alexey: Yeah, so that's a long story. I don't know how I can compress 10 years into a few minutes. I'll try to be brief. Actually, Eugene interviewed me half a year ago or something like that. There is a long article that he also published on Towards Data Science. So if you want to have to know a longer version of that, you can go check out that article. But yeah, my background is in Java development, and I was doing that for a couple of years. Then, one day, I came across this website called Coursera. Maybe some of you heard about it. On this website, there was a course called ‘machine learning’. I watched that course, which is by Andrew Ng from Stanford, and I really loved it. I thought, “Okay, I think I'm wasting my time doing this Java stuff, I should do something else. I should actually use some of the math skills I picked up at university and do this kind of stuff.” (1:26)

Alexey: So I started doing more courses and then I started interviewing with companies. They said, “Yeah, you don't have enough experience or enough education.” As a result I joined a Master's program, and at the same time, I started freelancing. It was actually very easy to start freelancing. I was surprised by that. Fast forward two years, I got a portfolio with my freelance projects, with my Master's diploma thesis, and that was enough to get a job. Since then, I've been doing data science full time. My engineering background was very helpful in getting my first job as a data scientist. (1:26)

Alexey: Then at the second company where I worked, I focused more on actually engineering things – setting up the infrastructure, taking care of all of these data pipelines, and all that. Because there was nobody else who wanted to do that and I thought, “Okay, maybe this could be me who takes care of that.” And it worked out really well. Back then – and I think this is still true today – companies really value this kind of experience. A lot of people who took the courses know how to train a model in Jupyter Notebook, but what's after that? How can you deploy these things? Or before that – how can you work on data pipelines? Having this experience that I got at the startup really helped me in basically every place where I interviewed – everyone was interested in that. (1:26)

Alexey: Also, the experience I got from a startup, doing a bit of everything, was also very helpful in a corporate environment. OLX, where I work now, is more like a corporation than a startup. I've been with OLX for more than three years – I joined as a senior data scientist. I think I was doing a pretty good job since I got promoted two times there. I have a pretty good manager who noticed that all my efforts and I got a few promotions. Yeah, I guess that's it –10 years of experience condensed into a couple of minutes. (1:26)

Eugene: Nice. Thank you for sharing that with us. Do you have any lessons from your experience that you have gotten so far? (4:57)

Alexey: Maybe not lessons, but my advice would be – you don't have to stay within your comfort zone. Try to step outside of your responsibilities. This is what I did with all the setting data pipelines and whatnot. That really helped. This opened the door to many, many opportunities. I think that's one thing. The other thing is very similar. It's not only about technology, it's not only about engineering – also try to get into product development. Try to understand and try to learn from your product manager. They're there for a reason. They make sure that we, as a team, are working on the right thing. So try to learn from them. (5:06)

Alexey: I did these two things and that was quite helpful in my career. Another lesson that may be something that I learned the hard way is – don't chase the exciting things. If there is a neural network that can do amazing things, it doesn't mean that you need that network for your business problem. Most likely, a couple of ‘if statements’ will be enough or maybe logistic regression or something like that. (5:06)

Being a principal data scientist

Eugene: Cool. Thank you for that, Alexey. I have one other question. You shared with us about all the technical stuff that you did early in your career. But now that you're a principal data scientist – now you're someone more senior – can you share a bit about what you do in your work? I guess this is a question for people out there who are thinking, “What does a senior data scientist actually do? Or principal data scientists – what do they do?” (6:27)

Alexey: Yeah. You can think of me as an internal consultant. We have a lot of teams at OLX who work on different things. We have a team that works on moderation, a team that works on search and recommendation, a team that works on other things. We have about six teams or something along those lines. I'm like a consultant to these teams. I don't belong to any particular team but if somebody needs help with something, I come there and help. Usually, this is about things like this – let's say somebody comes up with a model and they want to find out what's the best way of deploying this model. Or before that – before we even start working on a project, sometimes product managers reach out to me saying, “Hey, can you help us figure out if machine learning is actually needed here?” Or “How would you go about using machine learning here? What kind of data do we need?” (6:52)

Alexey: So I do a lot of consultancy, sort of, and I also do a lot of coordinating and alignment. Since I don't belong to any particular team, I can see what other teams are working on. I know what kind of projects they work on and when a new project starts, I can say, “Wait a minute. This looks similar to what the other team did one year ago. Go talk to them.” I need to do this kind of thing quite often and say “Talk to this team.” Or “Talk to that team.” Also, maybe the last bit is taking part in architectural discussions, or making technical decisions that affect many teams, like, “How do we standardize the way we're doing data science?” Even things like “What kind of package manager will we use so that it’s uniform across all the teams?” (6:52)

Alexey: Maybe one last aspect is mentoring and education, like courses. For example, the last course I did was on machine learning for product managers, the goal of which is to tell them why we need machine learning, what is possible to do with machine learning, and things like that. Then there’s also mentoring people. For example, let's say we have an analyst who wants to become a data scientist – what do they need to do to make the transition? Or maybe there is a data engineer who wants to become a machine learning engineer. Or sometimes, like a couple of years ago, we had this discussion about “Who is a machine learning engineer? What kind of responsibilities do they have?” So we sat down with other teams and defined these responsibilities. That's also something that I do. (6:52)

Eugene: That's very high leverage work – getting teams to talk to each other to make sure that they don't reinvent the wheel, removing unnecessary work, and also educating people and creating courses. I can sort of see where the motivation comes from to start Data Talks Club. I remember more than a year ago, at the end of 2020, I think, Alexey mentioned this idea of building a community, which turned out to be Data Talks Club. Now I think he has almost 9,000 or maybe 10,000 already. (9:36)

Alexey: It's almost 10. (10:04)


Eugene: Almost 10. That's pretty crazy. Can you take us back to the very beginning? What gave you this idea? (10:05)

Alexey: Even before I started my career, even before I had my first computer, communities were already a part of my life. I was interested in programming. Actually, I did have a computer, but I didn't have internet. I became interested in programming. I had Delphi by Borland, which is a programming language based on Pascal. I needed to do some stuff there, but without the internet, it's very difficult. The ‘help’ when you press F1, wasn't super helpful, it was also in English and I didn't speak English that well back then. So I had to rely on others to help me with that. For me, it was a community. (10:13)

Alexey: I found online forums where developers who like Delphi were hanging out. I had to actually go to my mom's place or my father's place to access the internet. I would go to the forums and ask them for advice. Sometimes, after a couple of months or half a year, I would also jump in and answer questions. That was great. When I was at university, when I already had access to the internet, I was a part of many different online communities, not only development-related, but also some hobbies. For example, I was into bootleg music. So this is when you go to a concert, like a metal band concert, for example and you film the concert, and then you exchange with other enthusiasts. I was really into these online communities. And then I became a Java developer. The first thing I did was register at Java Talks, which was an online community for people who are Java developers. And this is actually where the name comes from, for Data Talks Club. I was inspired by that name. So I just started answering questions in that community. After a couple of months, people thought that I was a super-experienced person who knows everything. I wasn't. I was a junior. I had no clue what I was talking about. But in people's minds, I was somebody who knew the answer to everything. That was a really nice experience. (10:13)

Alexey: Then the same thing happened when I decided to switch from Java development to data science. I joined a couple of online communities and I tried answering questions there. That also helped in getting to know things – to see what doesn't work for people, where they get stuck, what they need help with. So I did some research and put in the answers. For me, communities were a big part of my life. At some point, like in 2020, I was quite active on LinkedIn. I would do a post every day with different career advice, like Daliana is doing right now. So I was doing that sort of thing. I was getting a lot of questions in direct messages on LinkedIn, where people would say, “Hey, I have this situation, can you help me?” Or “I have that situation, can you help me?” (10:13)

Alexey: I was also doing some career consultation. I would have a 30-minute Zoom call with people where they would ask me different questions. The funny thing is, sometimes people would come to me and say, “Hey, I am a QA engineer. How do I switch to product management?” I have no idea. I am not a QA engineer. I don't know anything about product management. But the funny thing is that I was able to actually help them by just listening and asking, “Hey, but what do you like more?” Then people would just talk and then this rubber duck thing kicks in where they just need to talk to somebody in order to make a decision. So I was doing that for a couple of months. (10:13)

Alexey: Then I thought, “Okay. It would be nice to take all these direct messages I get from LinkedIn and somehow scale that.” Because it wasn't easy to answer each and every one of them. Then the second thing was all these consultation calls that I had and “How about putting all these people together in some place and having them talk to each other?” This is how the idea for Data Talks Club appeared. I thought that they needed a place to hang out together. I woke up. I went to GoDaddy. I registered the domain there. It took, I don't know, 10 minutes. Then I went to MailChimp and I set up a landing page, it also took like 10 minutes. Then I put a link to a few places, mostly on my GitHub account. So then, when people would reach out to me, I would say, “Here's this community. You can actually not only ask me a question, but also get an answer from others. So join it.” And people would DM me, and I will reply with that or they would just stumble upon my GitHub and find the link there and also join. Yeah, it got some traction. (10:13)

Alexey: I started talking to people. I started asking them different questions, like “Hey, what do you do? Why did you join? What brought you here?” It was going quite well. We had the first event, which also attracted quite a few people. Also, you probably know Demetrios from the ML Ops community, they also have a podcast. So I was a guest on his podcast. I really liked the format, the way he invited people, basically what we are doing right now. We are having an interview, but it's live. People can join, ask questions, and then it’s recorded and released as a podcast. So I really liked that format and I decided to try something similar. Yeah, so this is how it started. (10:13)

The beginning and growth of DataTalks.Club

Eugene: I recall that you shared some statistics with me. I think in the first month the growth was pretty bad compared to six months later. What happened? Could you refresh your memory about how the Data Talks community grew? (16:38)

Alexey: I don't remember the numbers. I think, the first year – by the end of 2020 – we had like 500 people. By the end of 2021, it was 9000. So yeah. What I did the first month, I think I told you, was welcome everyone in the community – I was trying to get to know them. I was trying to ask what brought them here and trying to learn what kind of problems they have, what people are interested in, and what I can do to solve their problems. This is what I was doing the first month, welcoming people, trying to understand what kind of problems they have. (16:54)

Alexey: That actually doesn't scale well when more people join, so at some point, I just stopped doing that. When three, four people join per day, I can do that. But when 10 people or 20 people join per day, it's not possible to welcome everyone and ask them these questions. Then, in the first quarter, we started events. A friend of mine asked me if I know a place where he can give a talk. And I said, “Yeah, actually, you know what? There is a place where you can do that.” (16:54)

Eugene: “There is a place now.” [laughs] (18:28)

Alexey: [laughs] I think, Eugene, you also attended that event. It was about deploying models with Sage Maker from Dmitri. That was the first event and like, 70 people or something like that joined it. I thought, “Wow, so cool. I should do more of that.” Actually, the second event was like 20, the third event was like 10. So I think it was just the excitement of something new and that excitement gradually went down. And then, I think around 20 people was the usual attendance. So 70 is more like an outlier. This was when I started actively announcing that we have events, that we have this awesome community, I was creating LinkedIn posts. (18:30)

Alexey: Before I was more in stealth mode, but this is where we went live. In the first 6 months, we experimented with different activities, like the “Book of the Week” where we invite book authors and ask them questions about their books. I took this idea from a forum called JavaRanch. They used to do something similar and I thought it would be cool to do something like that in Data Talks Club as well. Then, we started the podcast. At some point, we even had a conference. I think you were also a speaker there at that conference, one year ago. Yeah, that attracted quite a few people. I think this is how it started. After that conference, I think more than 1000 people joined the community. And then gradually, by the end of this year, it reached 9000. (18:30)

Sustaining the pace

Eugene: I recall the first time that I really got a shot, which was when Alexey invited me to share on Data Talks Club. I said “Okay, what is DataTalks.Club about?” That's when I realized that Alexey was actually putting out two videos, or podcasts, a week. That's a lot of work on top of a full time job at OLX. I was wondering if you could share with us how you do it. How do you sustain it? Were there times that you actually felt like skipping for a week? (20:22)

Alexey: Yeah, there were times. What actually helped to not skip was planning in advance. I would sit – like right now, for example – and plan one month in advance. Then I would announce it. Once it's announced, it's very hard to skip. People already signed up, they expect me to show up. Then, of course, there is a guest who comes. So there is no way to skip after it’s announced. I try to do this for a month in advance and that really helps to sustain that. (20:56)

Alexey: The way it works is – I reach out to people and ask, “Hey, we have this great community. We have weekly events. Do you want to speak at our event or be a guest at our podcast?” Only maybe half answer and out of this half, maybe half say ‘yes’. So it's not always easy to find speakers. What I found helpful is when I ask somebody for a recommendation, I also ask, “Hey, do you know somebody who can speak at our event?” Then, when they reach out, I say, “Person X recommended that I reach out to you and they said that you would be interested in speaking. What do you think about doing that?” That increases the chances of getting a positive answer by a lot. (20:56)

Alexey: With webinars, it's easier. Usually, there is not so much preparation on my side. Usually the guest, or the speaker, takes the initiative. It's mostly on them to prepare. I only need to get the title, the description, biography, and a picture. So it's not difficult for me to prepare for that. With podcasts, it’s a bit trickier, because I actually need to do some research and put it all together. Usually, there is a topic, so first of all, I need to know what kind of things I can ask this person. This is why I need to do some research, look at the Internet footprint of this person, and see what they usually talk about. Or I ask them, “Hey, what do you want to talk about?” But when I do this, they usually say, “Oh, I don't know. What do you want to ask me?” So I usually need to do some research, and put together a list of questions. Then I sent this doc to a guest and then the guest says, “Okay, yes. I can answer this, this, and this question, but I don't like this question. Let's remove that.” We'll do a couple of iterations and once they are comfortable with the questions, we make an announcement. (20:56)

Alexey: It takes some time to prepare for the podcasts, but I actually enjoyed doing that quite a lot – doing some research and preparing these questions. After that, I announce it on a couple of platforms. So we use Eventbrite, MeetUp. I make the announcements there and I use automation tools like Zapier. The moment I post something on Eventbrite, it posts a link to Slack, to Twitter, to LinkedIn. It really helps me with a lot of routine stuff. So that's helpful and makes it a bit easier for me to manage these things. That's another way I sustain doing this, because I rely on tools like Zapier to make my life easier. Yeah, I guess that's the summary. (20:56)

Types of talks

Eugene: That's really cool in how you use your ability as an engineer to automate stuff and try to make it easier for you. On Data Talks Club there are a lot of different kinds of talks, right? I was looking at it – there’s Open Source Spotlight, Minis, there's also the Book of the Week. What are all these different talks about and what do you hope to achieve from each type of talk? (24:38)

Alexey: Yeah, indeed. So let's start with Open Source Spotlight. As the name suggests, it gives a Spotlight to Open Source projects. We invite open source authors to talk about their tools – the tools they're building. This is something that I'm doing more in the background. I don't think I ever announced “Hey, we're doing this thing.” Just one day, a video like that appeared. It was actually – you probably know Neil, because you also interviewed him. He is doing a model store, which is an open source framework for storing your models. I interviewed him once and I liked this format of talking to open source authors. Then I reached out to more and more people with that. Even though I haven't announced them, these videos are published to our YouTube channel and some people watched them. What I want to do there is create a page on our website called “tools” and all these interviews – all these open source tools – will end up there on Data Talks Club's website. Actually, by the way, I am now working on creating a new design for the website and this new design will already have this “tools” page. So that's Open Source Spotlight. (25:01)

Alexey: Then we have, as you mentioned, Minis. So Minis – I was preparing to go on vacation for one month and I thought, “Okay, it's not a good idea to not have any activity on the YouTube channel or in the community for a month. So what can I do to actually have some activity there? To have people in the community entertained?” So I thought, “Let me reach out to a few people who are active in the community and ask them about some things for 10-15 minutes.” That was the idea of Minis. I reached out to a couple of people, we recorded that, and then, every week of my vacation, during the time when we would usually have a webinar, I would release one of these videos to keep the channel active. That was only for one month, but I really liked the format. However, given the amount of other activities I'm doing, it's just difficult to also add that. So for now, there are no Minis anymore. But maybe one day, we'll come back to that and have more of that. Because I really like the format – it's small and focused. Unlike this one, which is long and not focused. [laughs] (25:01)

Eugene: Yeah, I know. It's great that actually all these different ideas came out just out of necessity, right? It’s not something that was planned. Okay, last question about the community, since I want to ask you about your book. So, what are some of the most popular talks or guests that you had? Not considering engagement, but more of some of your favorites? (27:51)

Alexey: Yeah, so there are some very popular or well-known guests. There was Martin Kleppmann, who is the author of Designing Data-Intensive Applications. He did an AMA in our Slack community. That was probably the most famous person we had. Of course, I put an announcement in Reddit and it had like 100 likes in one hour or something like that. That brought quite a few people to the community. So that was very well received. After that I compiled all his answers and published them as a web page, which also generated some attention. (28:11)

Alexey: Then another famous guest – I don't know if he considers himself famous, you probably know him, Santiago. He has quite a lot of followers on Twitter – more than 100,000 right now. I would qualify him as famous, because that's quite a following that he has on Twitter. I had him as a guest. He was talking about his transition process from software engineering to machine learning. (28:11)

Alexey: Another talk that was very well received was from Elena Samuylova, who was talking about “how your machine learning project would fail”. It was also a talk that she gave during the conference one year ago. It got quite a lot of attention – it has more than 1000 views. That was quite a good talk. The talk was good and it also received a lot of attention. You also asked about things that didn't get a lot of engagement, but more like my favorites. There are two talks that I really loved. I really loved these topics and my interview with the guests. These are actually the least popular episodes of the Data Talks Club podcast. I don't know why. They were really good, in my opinion. (28:11)

Alexey: The first one was about Development Advocacy for Data Science from Elle O’Brian. I guess people are not so much into Dev advocacy in the community. But the talk was very nice. The second least popular talk was from Demetrius from the ML Ops community, which was about community building. So I guess it's also kinda off topic for Data Talks Club. So these two events were quite good. I really liked the topics. I really liked the guests. It was quite an engaging conversation. But yeah, I don't think these two podcast episodes got enough attention. I suspect that both of these topics are a bit off-topic for Data Talks Club. (28:11)

Eugene: Yeah, your audience is probably more engaged in data science topics. (31:15)

Alexey: Exactly. But I really love them personally because this is something that I'm interested in. For example, talking to Demetrius about community building – most of the things that he shared with me, I could apply immediately to Data Talks Club. So that was useful for me personally. (31:17)

Making DataTalks.Club self-sufficient

Eugene: Okay. So one last question. I know we're coming on time, but one last question before I ask you a lot of questions about your book. I know you've been trying to monetize the community, with the main goal of making the community become more self-sufficient. Maybe hiring more people, making it less reliant on you, so that you can go on vacation and Data Talks Club can still continue. Can you share how that's going? How can people support Data Talks Club? (31:37)

Alexey: Yeah, I am thinking about the word “monetize” and if I like it or not. There is a bit of a negative sentiment to me about this. I don't know why. But yeah, the idea is to earn some money with the work I'm doing. Mostly, for me, I was spending money on the community. I actually spent around 500 euros per month. (32:02)

Eugene: Wow, that's a lot. (32:31)

Alexey: That's… yeah, so that’s a lot. I have a good salary. But yes. (32:33)

Eugene: That’s a couple thousand a year. That's a lot. (32:35)

Alexey: Yeah, that's a lot. For me, I like doing this, but getting some of this money back would be nice. I was thinking, “How can I actually do that? How can I start earning some money?” Then the second thing is, like you said, it would be nice for me to be able to offload some of the things I'm doing – to hire somebody to do some things. There are a lot of routine and mundane tasks like setting up all these events that could be delegated to somebody, if I had money for that. Because I think 500 euros is already quite a lot to spend and even more on top of that. (32:39)

Alexey: For the last six months, I have been talking to different companies and I was asking them, “Hey, would you like to support the community?” It is very difficult for me. I consider myself an engineer, so I don't have a “sales” mindset. For me, it's very difficult to sell. It's very unnatural to me. I'm more used to sitting in front of a computer and coding something, rather than talking to people and convincing them to give me some money. That is very difficult for me. Yeah. So far, it wasn't very fruitful. I'm learning a lot. I'm taking courses about sales. I don't know if they are helpful. Let's see. [laughs] Yeah. (32:39)

Alexey: I did find a few sponsors. There was one person who wanted to advertise an event, so they paid me some money to advertise this event through the newsletter. I don't think people actually engaged with this link at all. There were very few registrations, so it was super low engagement. I don't know why. The event was actually good. So that person said, “Okay, yeah. It's just not bringing any value. The return on investment is very bad.” But he paid me money, so that was already good. Then another one was from TopCoder. If you know, Top Coder hosts data science competitions as well. It was less than one month ago, I think. They also gave me some money to advertise a competition that they were hosting. They actually did like the results. The engagement was quite good, so I’m keeping my fingers crossed that they will like this and come back for more things like that. (32:39)

Alexey: I also managed to partner with Toloka, which is a company similar to Mechanical Turk from Amazon. They do crowdsourcing. With Toloka, we will organize a workshop, which is actually already announced on our website. This is a workshop about using crowdsourcing for monitoring the performance of your models. So it’s using humans (humans in the loop) to see if your model is still performing well, and if it isn’t, then how can we trigger a retrain, for example. I think the topic is pretty interesting, so I hope people will also like this, come to the workshop and that Toloka will like it and decide to do more events like that with us. So if you're listening to this right now, you can go check our website, there is an event which I think is called Crowdsourcing for Model Performance. If you think this is interesting, please sign up, because they will look at these numbers. And if the numbers are good, then they will consider repeating business. (32:39)

Eugene: Go and sign up now. (36:45)

Alexey: Yeah. Right now I'm also in the process of talking to a few more companies. I would like to convince them to do something like longer-term sponsoring, like maybe a sponsorship for a year. So far I haven't “closed” as salespeople say. Nobody signed up yet, nobody signed a contract. But let's see how it goes. It's actually something entirely new to me. It's way out of my comfort zone. So I'm also learning that. I hope we will see new sponsors soon. I keep my fingers crossed. So, yeah – please, interact and engage with content from sponsors if you think it's valuable, because they're looking at these numbers and this is what they will use to decide if they want to sponsor more communities or not. (36:50)

Alexey: That will be very helpful for the community and for me personally, to be able to use some of this money to invest back in the community – maybe hire a personal assistant to take care of all these mundane tasks and all that. So if you're a company, or if you're working in a company, maybe you can talk to, I don't know, your marketing department and see if they would be interested in sponsoring and supporting Data Talks Club. If they would be, maybe you can connect us. That will be super helpful. (36:50)

Alexey’s book and course

Eugene: Now's the best time [audio cuts out] initially starting this while looking for sponsors. It's gonna be easier, you don't really have to queue to have an event, because he can give his full attention. So thank you for sharing with us about that, Alexey, about Data Talks Club and how much work it took [audio cuts out]. Recently, you wrote a book called ML Bookcamp. You've put in a lot of effort into it and it was recently published – late last year. Can you share with us about that book? What is it about? Who is it for? And why did you write it? (38:22)

Alexey: Yeah. I'll try to be brief, because there are still questions from the community that I want to cover. This book was for people like me, for software engineers, who want to go into machine learning. In the book, I focused on teaching through projects. I also focused on being more end-to-end rather than just, “Here's how you train your model in Jupyter Notebook and here's some nice graphs.” What happens after that? (39:06)

Alexey: There are three chapters devoted to deployment, like deployment with Flask, deployment with AWS Lambda, and deployment with Kubernetes. So these kinds of things are there as well. I think this is one of the things that separates this book from other books. I don't think there are many other books that talk about that. Yeah, there is also a course based on the book. It's called Machine Learning Zoomcamp, which is free. It's based on the book, it's just in video format. It's cohort based – well, it was cohort based, now it's almost over. If you prefer this kind of content more, you can check it out in our Slack – it’s #course-ml-zoomcamp. (39:06)

Alexey: In the book, each chapter is basically a different model, but there are also the results of homework, and you're getting some points for doing this homework. Then there are also projects. To get the certificate, you need to finish the projects. So if you don't finish the projects, then you will not get a certificate at the end. So the most important thing is to do these projects. When you see how others do something, now you have to repeat it and do it yourself. Not to just follow, but do something on your own. That's why I make it mandatory for people to pass two projects to get a certificate. So yeah, check it out. (39:06)

Eugene: The course that you released is like 13 lessons over four months. Why do it for free? What's your motivation behind that? And how much effort did it take to create a course? (41:11)

Alexey: It was too much. I wish I was timing myself. [laughs] I didn't know what I was signing up for when I announced it, because it's just an insane amount of work. But yeah, I always wanted to do a course. Maybe I should have started with something smaller than that [laughs]. Yeah, I always wanted to do that and see how it feels to actually do a course. Then there was also another motivation, which is to spread the word about the book. Because everyone will know that this book exists – at least everyone who has taken the course. I also wanted to attract more people to the community. So it kind of served multiple purposes. (41:23)

Alexey: I really liked the engagement. I really liked the outcome, even though it was very difficult to record, I would record some things multiple times and I would spend a lot of time editing. I still haven't finished editing the last module about Kubeflow serving. But I really liked the result. I really liked the engagement and the feedback I got from people. That went quite well. Maybe for the next course, I'll try to do something less ambitious, perhaps. [laughs] But I'm quite satisfied with the outcome. The feedback I'm getting from people is really motivating and inspiring. (41:23)

Advice for people starting in data science and staying motivated

Eugene: Well, the projects you take on have always had that quality, where they’re very, very big. But you manage to tackle them anyway, Data Talks Club, and then the book, and now the course. Well, now we have a couple of questions from the community. Lynn asks, “What advice would you give to someone starting in this field? And how would they go about finding mentors?” (42:49)

Alexey: My advice would be to join a community. This is what I did and I think it worked quite well for me. Every time I wanted to start in a new field, I joined the community. It wasn't always career-related. As I said, I was into exchanging these bootleg videos. Yeah, so join a community. Then after you join, don't just sit there and watch what people talk about. Answer questions, do some research. Then perhaps people will think that you're some sort of experienced person who knows everything. They don't have to know that maybe you're not. [laughs] Yeah, that's the number one advice I would give. (43:09)

Eugene: I have a question from Amruta. “Have you ever felt like giving up during your data science journey? What kept you going? What motivated you?” (43:55)

Alexey: For my data science journey – I don't remember actually having this feeling of giving up entirely. But sometimes I would feel frustrated if I cannot derive a formula or if I wouldn't be able to understand SVM. Now I look back at this and find it funny, like, “Why did I spend time on learning about SVMs? Why did I think they are important?” I wish somebody would tell me, “Hey, don't waste your time on SVMs.” So yeah, sometimes I felt frustrated. I thought I was learning something important, but these things weren't important at the end. (44:04)

Alexey: What motivated me to keep going, I guess, was the interest, or the spark, that Andrew Ng created in me. When I first watched that video, I thought, “Wow. It's so cool. You can actually use data for these kinds of things.” It felt like magic to me – being able to do this sort of magic and then seeing the results in action, I think this is what motivated me. I still have that motivation. Maybe less so than seven years ago, because now some of the things have become routine, like parameter tuning, and then all these other things that are maybe less exciting. But I found out that I really like the engineering component of this, and seeing my result in action. This is what still keeps me going. (44:04)

Not keeping up to date with new tools

Eugene: I know what you mean. I think, initially, when you first started, it’s quite amazing that you can actually use data to predict what's going to happen. Now you’re building these things with your own hands. A question from William, “How do you keep up to date with new tools and advances in the field? And at work, how do you evaluate new tools and libraries?” (45:40)

Alexey: To answer the first part of the question from William is – I do not keep up to date with new tools and advances because it's not possible. I remember, I would use an RSS reader. It was the old reader, something like Google Reader, but maybe there are people who still remember Google Reader – it was awesome. There are some RSS readers and I would set up an RSS feed from an archive to see. Then just after a couple of weeks, I understood that it's too much. There is no way – it's not humanly possible to look at this amount of information. (45:59)

Alexey: At some point, I just said “Okay, why do I do this? What does it actually bring me?” I think this understanding, or this realization, came with a bit of experience – understanding that sometimes what a business needs is not state-of-the-art, so it would be better for me to look at what other companies are doing, what works for them, and try to stay up to date with that. Not with recent deployments, not with recent advances. It’s the same with tools – every day, there is a new tool. You go on Twitter and there are always some new tools that will solve all your problems. But one month after that, nobody remembers about these tools. (45:59)

Alexey: It's probably a good idea when you see a trend last over multiple months and this tool keeps appearing – people keep talking about this. Actually, I want to try Kedro. Do you know, Kedro? Have you heard about this? I've been hearing about this tool for quite a while now, so I may actually give it a try. This is like SciKitLearn Pipelines but better. This is how I understood it. So I want to give it a try. But yeah, so I just try to not stay up to date. [laughs] That's my answer. Then the second part was “How do I evaluate new tools?” I don't. There are many other problems that we need to solve. (45:59)

Alexey: Sometimes these tools appear and then we have these hackathons, when we can try to play with these tools and see if it's worth including them in the pipeline. But it's not something I do on a regular basis. It's not like I tried to devote time to it, I heard the advice, “Spend 10% of your time trying to play with new stuff.” Maybe this is good advice, actually. But I do not follow that because I have too many problems – other problems that I need to solve that are not tool specifics. They are more focused on “Are we actually building the right thing? Do people care about that?” It doesn't really matter what kind of tool we use for that. (45:59)

Staying productive

Eugene: I agree. Besides, you're so busy at work and you don't have time to evaluate these new tools, Rob asks, “How are you so productive? What does your process look like? How do you balance your full-time job at OLX and Data Talks Club and writing a book and doing ML Zoomcamp? Do you even sleep? (48:56)

Alexey: Yeah, sleep is annoying, right? [laughs] We have to get it. Sleep is overrated. No, I'm joking. Yeah. Maybe I'm just good at keeping appearances. But I actually like to slack off. I like to procrastinate. I keep things off for very long periods before I start doing them. Yeah, my process looks like – I postpone things until the very last day and then I do them right before the deadline. For me what helps is having these deadlines and having these deadlines public. If I have that, like with the course – because every week, I have to release something – that keeps me not procrastinating too much. So I don't know if it's a secret, but yeah. [laughs] (49:16)

Eugene: That makes sense. It's like being in public, right? Once you have people who you're accountable to, people are expecting you to do something and you don't want to let people down. Therefore, you deliver. (50:14)

Alexey: But unfortunately, I have to sleep. I haven't found a way around that. [laughs] (50:25)

Learning technical subjects and keeping notes

Eugene: Alright. So we have a few more questions. Quinn asks “How do you learn technical subjects? How do you know if you're able to apply it well?” You mentioned that you keep notes about your projects. Could you shed some light on that? (50:31)

Alexey: I think the best way to learn a technical subject, (especially for a technical subject) is to do it through projects. Eugene, I think you mentioned this concept a year ago, maybe, “just in time learning”. You build a project and then there is a thing you need to learn, so you start looking it up. Then you solve your problem and you learn just enough to solve this problem – so you don't learn the entirety of machine learning just to solve some classification problem. You maybe learn just logistic regression or whatever it is you need. That's how I try to learn technical subjects – by doing projects and by focusing on this project. That helps me stay focused on that. (50:44)

Alexey: So “How do you know if you can apply it well?” Well, I guess by asking for feedback. At work, that could mean deploying things and then doing A/B tests – things like that. But for my personal projects it’s more like, “Am I satisfied with the result? Does it work or does it break apart?” Things like this. About notes – when I try something new, I try to document everything I do, because I know that the next day I will wake up, and I won’t remember anything about it. Sometimes my bash keeps the history. Sometimes I can just go up and see the various comments, but this is very unreliable. Often, when I close my terminal, all the comments that I put are gone. I don't know why it happens. But I did not rely on that. (50:44)

Alexey: I would create a file, or I would open a Google document or Notion – recently I have been using Notion for that. Then I would just copy/paste the comments from the terminal there. Sometimes I would take screenshots with errors, and then would post that as well. I shared a couple of tutorials like that. I had these documents and then I put them together as docs – public tutorials – and shared them. I also have a couple of GitHub repos like that. So maybe you can check them out. (50:44)

Inspiration and idea generation for DataTalks.Club

Eugene: One more question from the Slack community and then we’ll move on to Slido. So Demetrius, who's a fellow organizer of support communities, I think he organizes the ML Ops community. He asked “Where do you get your community-related ideas from?” (53:04)

Alexey: Yeah, I know where Demetrios is coming from with this question. Yeah, I did get a few ideas – I'd say I got inspired – from the stuff Demetrius was doing at the ML Ops community. For example, the format of the podcast we have right now and doing it with a live audience. Today, I think this is the first time when it didn't really work well, because of the internet problems. It’s the first time in more than a year. I hope that it doesn't repeat on Friday when we have another one. This idea I “borrowed” from a Demetrious. Then there are some ideas, like for example, from JavaRanch, which is a Java community. They invited book authors to ask them questions. I also borrowed this idea. So I got a lot of inspiration from other communities and perhaps also added a personal twist to these ideas. (53:18)

Eugene: For example, the Minis and the Open Source Spotlight. (54:20)

Alexey: Yeah, exactly. Exactly. But mostly it comes from other communities, like the ML Ops community and from people – from the community members. Sometimes people will reach out to me saying, “Hey, I have this awesome idea. Do you want to try it?” Then usually, I say, “Yes, let's try it.” There are a few initiatives like that, which unfortunately didn't work out well. For example, about networking sessions. But somebody reached out to me saying, “Hey, it would be nice to just hang out in Zoom. We tried that and it didn't work out. But yeah, this is how these things happen. (54:24)

Eugene: Right. I think we have the last question for this podcast, which is from Slido. “Have you ever considered live coding sessions, or coding and commenting sessions?” (55:07)

Alexey: Yeah, that would be interesting. Not for Data Talks Club – we haven't really tried that. For the ML Zoomcamp, actually, there were a few office hours sessions when it was live coding. So maybe check it out. But this isn’t something that we would regularly do. It will be a good idea to try something like this. So yeah, thanks for the idea. But we are already doing a bit of that, so check out our Machine Learning Zoomcamp office hours. (55:19)

Wrapping up

Eugene: All right. So that's it for all the questions. Is there anything else that you felt that I didn't ask, but you really wanted to share with the audience? (55:56)

Alexey: No. I think I just want to thank everyone who has been with me all this time – who joined at the beginning, who joined half a year ago, who just joined this community. You guys kept me motivated – to see the community growing is awesome. This is a really good feeling. So thanks for being a part of that. I'm looking forward to this year to do even more of that. I hope I stay sane and find sponsors, so that I can delegate some of this work. But I'm really excited about this year. So thanks for being a part of that. (56:07)

Eugene: Thank you to everyone who's out there listening. Maybe next year we'll have another review of how things went this year and we'll do the same format. Alright. Thank you, everyone. (56:50)

Alexey: Goodbye, everyone. And thank you, Eugene, for interviewing me. (57:00)

Eugene: Likewise. My pleasure. (57:03)

Alexey: Yeah, good way. (57:06)

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.

DataTalks.Club. Hosted on GitHub Pages. We use cookies.