MLOps Zoomcamp: Free MLOps course. Register here!

DataTalks.Club

DataTalks.Club Anniversary Interview

Season 16, episode 1 of the DataTalks.Club podcast with Alexey Grigorev, Johanna Bayer

Did you like this episode? Check other episodes of the podcast, and register for new events.

Transcript

Johanna: Welcome everyone. My name is Johanna and Welcome to DataTalks.Club. DataTalks.Club turns three today. Woo-hoo! So we're talking about DataTalks.Club. For this occasion, we decided to turn the mic around, and we have a special guest today, Alexey Grigorev himself. So welcome Alexey! (0.0)

A special episode

Alexey: Thanks for having me. [chuckles] (27.0)

Johanna: Yes! How does it feel to be in the guest chair? (29.0)

Alexey: Uh, strange. I mean, it's not the first time I'm a podcast guest. But yeah, it's unusual. (33.0)

Johanna: But on your own podcast? [chuckles] Yeah, cool. I don't think I need to introduce Alexey too much. He's the founder of DataTalks.Club, as we all know. And the questions for today... (40.0)

Alexey: Maybe we should also mention some information on the host? (54.0)

Johanna: Yes. (1:00)

Alexey: Maybe not everyone knows that Johanna is actually the mastermind behind all the podcast episodes. All the questions that you hear me asking, Johanna prepares them? So thanks a lot for doing that. (1:01)

Alexey's background

Johanna: Yeah, no problem. It's actually quite fun sometimes to relisten to the actual podcast because I prepare the questions and then see what you make of it. [chuckles] Sometimes you stick to them, and sometimes you just you know – sway. [laughs] Yeah. It's been fun. This time the questions for this episode have been prepared mostly by the community. We've asked for questions – you can still submit questions on Slido. It's going to be like an AMA (ask me anything) for Alexey. But before we go into talking about the community and DataTalks.Club, maybe Alexey, can you tell us a bit about yourself – about your background and your career journey? (1:17)

Alexey: It's funny. Usually, I ask that question and now I have to answer this too. [chuckles] I'm wondering how far back I should go. So I'll probably start with graduating with a degree in information technology. I studied information technology and after graduation, I worked as a Java developer for a few years. My last work as a Java developer was at a bank – it was a Swiss bank. They have highly regulated processes for everything. While the job was interesting, it was also sometimes mundane and kind of boring. I thought, “Okay, did I study for five years to do Java? Why did I study all this math?” (2:05)

Alexey: And then, during this time, the platform Coursera appeared, and one of the courses there was machine learning. So I took that course and I recognized that, “Okay, this looks like something I want to do.” I took this course, I took some other courses – I ended up doing a Master's degree in business intelligence. This is how I actually ended up in Germany. The second year of this program was in Germany (in Berlin). I liked the city so much that I decided to stay. (2:05)

Alexey: Since graduation, I have worked as a data scientist – up to the point where I fully focused on DataTalks.Club, which happened this year. During this time, while working as a data scientist... Well, the title was “data scientist,” but I was doing pretty much everything, which included software engineering, data engineering, and ML engineering. Data science, too. But the funny thing is, my last work was at OLX – I worked there for four and a half years – and over this period, I trained a model only once. (2:05)

Johanna: Wow. [chuckles] (4:10)

Alexey: Yeah. I was mostly doing the engineering stuff there, even though my title was “data scientist”. It's a fun fact. Now, since April, I'm fully focused on DataTalks.Club. (4:11)

Johanna: That is really exciting. Do you still... What language did you use in your job? Probably not Java, right? You probably specialized in Python? (4:27)

Alexey: Right now, mostly English. [chuckles] (4:37)

Johanna: No, I mean the programming language. [laughs] (4:39)

Alexey: Yes. [chuckles] Well, for DataTalks.Club, it's English. Well, we use Python for courses and I sometimes occasionally write little scripts to help with moderation, with some other things – for example, for the courses that we do, all the scripts are in Python. For me the go-to language is Python. I think I still remember some Java – I can still do a bit of JavaScript. Some of the code that we use for DataTalks.Club, for example – the code for generating certificates. When you finish a course you get the certificate. That code is written with JavaScript. But yeah, it's mostly Python. (4:40)

Johanna: Yeah. I started with Java way back, but I didn't stay for very long. [chuckles] Python is definitely the better language, I think, especially for machine learning. Cool! Thank you so much, Alexey, I think I might actually jump into the first question. We have questions both collected on Slack and via LinkedIn on Slido. I think we'll start with the Slido ones. (5:26)

Plans for the future of DataTalks.Club

Johanna: The first question is “Happy birthday, DataTalks.Club!” Thank you. “What are the plans for the future?” (5:55)

Alexey: Yeah. Luckily, I had access to these questions before the interview, otherwise – this and some other ones, I would have trouble answering. Right now, I want to focus on making it sustainable. Since this is my main job right now, I want to also make sure I don't starve – so I have enough money to survive. Berlin is not the cheapest city [to live in]. For me, right now, I'm focusing on making it sustainable and making sure that there is enough income for me and for the team. (6:03)

Alexey: There are also two people working for DataTalks.Club – Francis and Valeria. Francis is the community manager. He takes care of... He's basically running everything behind the scenes. All the events are organized by him, all the YouTube videos are published by him, when we need to publish something or edit. He edits a lot of videos. Basically, he's doing pretty much everything. I would not be able to run the community without his help. (6:03)

Johanna: Yeah, a massive shout-out to Francis. He's pretty amazing. (7:18)

Alexey: So if you were ever a speaker or a guest at DataTalks.Club (Johanna was), usually Francis takes care of reaching out and organizing everything. And then we have Valeria. Valeria takes care of the newsletter, and the social media content, and she helps a lot with other things. I need to pay them. Of course, they like what they do. [chuckles] But it's a job. [Johanna agrees] So I need to pay them. And I need to also get money. For me, for us, the focus right now is to see how to make it work. So far, it's been good but I still want to focus on getting more sponsors for the community while also trying to not overdo it. (7:22)

Alexey: I think if we push too heavily for monetization, it will not help attract more people. We need to find the right balance. What we're doing right now is good – we just need to find more sponsors. I think we managed to find the right balance, but we just need to find more companies that want to support us. So that's my main activity right now – my main focus. But apart from that, of course, we have a lot of activities. We want to continue doing these activities. One of the things that quite often comes up right now is people asking for an LLM course – an LLM Zoomacamp. (7:22)

Johanna: Oh, yeah. Of course. (9:06)

Alexey: There could be one. I cannot promise anything. Right now, I'm not sure if this is just hype and whether everyone wants that because it's a hype thing. Probably, it would be wise to wait for six to twelve months in order to see what happens if the hype dies out or it's still there. And if it's still there, it's possible that we will consider making another course. But also, for example, we are quite lucky with the three courses we have. Right now, we are running a machine learning engineering course (ML Zoomcamp) and it does not change significantly from year to year. So we need to maintain the content a little bit, but most of the content is the content that I recorded three years ago. So we're kind of lucky. With LLM, things change every day, right? (9:09)

Johanna: Yes. It's so fast-going. It's such a new field. I think it's come up. Yeah, I agree. But it could also be broader – not only LLM but more general AI. Very exciting! I didn't know that. Yeah, it's quite the hype at the moment. Very cool. The reason that I got this job is basically that I once mentioned to Alexey that I listen to every episode of the podcast because I'm an avid podcast lover. I do fully agree that some podcasts just have ads – a couple of ads are fine, but if it gets too much, it's just not good. I agree, there needs to be a balance. You also don't want to abuse people in some way. [chuckles] But the creator also needs to live. It's definitely a fine balance. Yeah, very cool. (10:05)

How LLMs will change the professional data landscape

Johanna: Let's actually move to the next question, which kind of touches on what we've just discussed. “How do you think the other jobs will change as different GPT-like services come into play and extend the skills of data professionals?” (11:10)

Alexey: That's an interesting thing. Of course, nobody knows. Right? [Johanna agrees] We'll only see that in a year or two, when people start using them actively. Right now, even though it's quite a hot area, some data scientists are still hesitant to try them. But once the community starts to adopt it more, then it will become interesting. I recently spoke with a friend of mine, also my ex-colleague – he's running his own startup right now. He says, right now, the main challenge when hiring (when evaluating) candidates is ChatGPT – because everyone can just copy and paste the take-home test and GPT will just provide the solution. [Johanna agrees] (11:26)

Alexey: So they were there thinking what to do with this and I liked the metaphor he used, for example you know there is weed, which is considered a drug – marijuana – and many countries try to fight it. But some countries say, “Okay, let's just embrace it and allow people to use it because there is no point in fighting it. People will still smoke weed.” He was comparing weed with ChatGPT. [laughs] “Let's legalize ChatGPT. Let's not forbid it.” You can tell people, “You cannot use ChatGPT.” But people will still use it. [Johanna agrees] But what about just saying, “Hey, for our take home test, you can use ChatGPT, you just have to tell us what prompts you used, how exactly you used it, what the problems were? What were the cases when it was not correct and you needed to fix something? Tell us about that.” And I think this is a nice approach. (11:26)

Alexey: Instead of saying, “Hey, ChatGPT will take our jobs,” we should just learn to use it and learn to accept it and see what happens. Right? [Johanna agrees] We'll see how it goes. I imagine most things, like mundane exploratory data analysis or just training a simple model – with ChatGPT, you can just ask it and it will give you the code, and then you just use it. But it's still not ideal. It's still buggy. It still hallucinates. It still comes up with functions that do not exist and whatnot (11:26)

Johanna: Exactly. Yeah. (14:08)

Alexey: So yeah, let's see. (14:10)

Johanna: Yeah. It's very interesting that you say “we” because I just moved to the Netherlands. [laughs] And of course, it's one of the most progressive countries with this. But that's actually a really nice approach. I like that. I think many companies use whiteboard challenges when they're hiring. It's the same in your normal life – you just Google. In some ways, ChatGPT is not that much more than Googling. I mean, it can nicely give you something formatted that you can just hand over, but still. I think it's a very good idea, actually. Yeah. Very cool. Nice. (14:11)

How DTC community members can contribute

Johanna: Let's move on to the next question that goes back to DataTalks.Club, “From your perspective, how can members best contribute to and benefit from their involvement in DataTalks.Club?” (14:56)

Alexey: I think the best thing you can do is be active on Slack. For DataTalks.Club, Slack is like the center of the community. Of course, we have many, many different activities, like the podcast, courses – but it kind of still centers around Slack. There are questions in Slack that are sometimes left without answers and I cannot physically go and answer them, even though sometimes I do know the answer – I cannot just go and answer that. It's already happening to a pretty large extent. I see a lot of community members jumping on and answering questions. So if more of us do that, I think the world will become a better place. (15:10)

Alexey: Actually, why did I suggest that? Of course, it's good that you help – but I also am looking back at my career, and I'm thinking about things that helped me in my career. When I just started my journey as a Java developer, the first thing I did was join a Java community. It was online forums. I used that forum to ask questions. But I did not have that many questions and I noticed that a lot of people were asking questions, and I was able to answer some of these questions. But I also noticed that there were questions that I was not able to answer. Some of these questions were interesting. I thought, “How about I do a bit of research?” And find out how to actually answer these questions. So I set aside like half an hour every day to do that – to help people. Over time, it accumulated. I learned a lot of new things. By just doing a bit of research, doing a bit of Googling, and trying some things, and figuring out what was wrong and how to help that person, I was able to learn so much myself that at the next job already, I had no difficulties whatsoever in passing the job interview. Also, they doubled my salary because it was so easy for me to pass the interview. I was so confident that I'd be not only qualified for this job, but I could do everything that was needed. It's because of the communities. (15:10)

Alexey: The same thing happened when I became a data scientist. When I got my first job as a data scientist, I also joined a data science community and I was doing the same thing. I was just hanging out in a Slack community. It was a Russian-speaking community called Open Data Science, which actually inspired me to create DataTalks.Club. I think one of the questions is about that, so I'll talk about that later. I was just hanging out there, asking my own questions, but also helping others. And it helped me too. It helped me tremendously. So what you can do to benefit from the involvement in the community is just go there and ask questions, even though maybe you don't know the answer. You can add a disclaimer saying, “Hey, I have not dealt with this myself, but I did a quick Googling and this is what I found out. Here is the result.” This will help everyone. That's one thing. (15:10)

Alexey: Apart from Slack, we have many other things. For example, we have three courses. If you graduated from some of these courses, and you want to help with the course, you can help by... Maybe you want to be a teaching assistant. That's also an option. You can ping me in Slack and say, “Hey, I really enjoyed doing MLOIps Zoomcamp. Can I be a teaching assistant there?” Or you can just, again, hang out in the course channels and answer questions from other community members. That will be super helpful. Then we have a few things like – we want to run a competition. (15:10)

Alexey: Last year, we did a Kaggle competition. It was a deep learning competition. We needed to classify images of different kitchen utensils – like forks, spoons, or just kitchen stuff like cups – into these categories. There is a picture of a spoon and the model needs to say that it's a spoon. It was a very fun competition and we want to do something similar this year. If you have some ideas, what the competition could be about and how to get the data for this completion (which is the most difficult part) then please reach out, and let's think about how to do that. (15:10)

Johanna: I remember you collected the data last year, right? [Alexey agrees] Yeah, I remember that. (20:16)

Alexey: Yeah, it was fun. Maybe this year, it could be something with LLMs, too. With LLMs, you can... Let's say you have a lot of text data and you can label this data with LLMs, or generate data with LLMs. It could be that, too. Another thing we have is called Project of the Week. The idea behind Project of the Week is more like – most of the content we have is us showing you how to do something, and we repeat it. (20:23)

Alexey: For example, we have workshops (or we have the courses), and in the workshop, there is a video and somebody is doing something and you follow along. But Project of the Week is different. It's more like a do-it-yourself thing. You get some instructions where it's not something you have to do today, but it's more like a suggestion, “This is what you can do today.” For example, let's say we want you to learn FastAPI or something like that – a new framework. There are some things you can do to learn this thing. We give you a set of suggestions and every day you try to look something up on Google, try to do something, and then every day you can say, “Okay, this is my progress.” The idea is, after seven days, you have a complete project that you can put in your portfolio. That's the idea behind Project of the Week. (20:23)

Alexey: We have quite a few community members who are active there, who are helping a lot with that, and coming up with project ideas. One of them, Antonis, is in this live chat. Hi Antonis! And the other one is Liliana. So thanks a lot for your help. The reason I'm saying that – if you have some ideas, or you want to learn something new, you can help with organizing this Project of the Week as well. There's actually a lot more. I think I can talk for hours about what you can do. But we have a lot of other questions, right? (20:23)

Johanna: Yeah. DataTalks.Club is massive! I'm still always surprised how you do all of these things. Because we also have the webinars and the podcast. People think they are interesting. If people think they are interesting, they can also probably reach out to Francis. We are always looking for new guests. I think, in general, the one thing that people can benefit from, which DataTalks.Club uses quite a bit, it's this Learning in Public. Always, if you learn something, Tweet it or talk about it, and we will probably also promote it. I think it's also a big part of the Zoomcamps. (22:36)

Alexey: Yes. Thanks for mentioning this. (23:15)

Main lessons Alexey learned while building the DataTalks.Club community

Johanna: Yeah. Cool. So then we move on, “What are the main lessons learned from your experience in building the DTC online community?” (23:18)

Alexey: Yeah, so that was also why I was lucky to be able to see this question before the interview, because it's a difficult question to answer, to be honest. I did a bit of brainstorming and I came up with a few things. I remember, at the beginning, the goal was quite ambitious. The goal was to create a community for all data people, which includes data analysts, data engineers, data scientists, data product managers, ML engineers – everyone who deals with data. But with time, I realized that it's too broad. It's not possible to be that broad. There are already communities – for example, there's a community called Locally Optimistic, who focus more on data product managers and data analysts. There is an MLOps community that focuses more on MLOps. (23:32)

Alexey: We also got the idea to focus on something – DataTalks.Club it was more beginner-friendly, with hands-on learning for a lot of our content. The focus is more on people who are starting their career or are continuing it, rather than who are advanced in their career, and with a lot of focus on hands-on stuff, and with a focus on the engineering aspects of data: ML engineering, data engineering, and MLOps. So these are engineering-heavy things where you need to code a lot. Our podcast is still kinda... Well, it's not super technical, but we still talk about careers and things that also appeal more to the beginner-intermediate level of audience rather than professionals. [Johanna agrees] I guess, with time, we actually found some focus. But still, one of the lessons is, “You cannot be too ambitious. You need to focus on something.” Then another thing is moderation – I did not expect that moderation would take so much time. (23:32)

Alexey: There was a funny story that a few times people tried to use the community as a dating website. [Johanna laughs] I don't know if there were any success stories. But I know that there were unsuccessful stories when ladies in the community did not like the attention they received from some community members. [Johanna groans] Of course, this is not why you join a professional community – to get private messages saying, “Hey, cutie. How about a date?” So yeah – I had to deal with this stuff. Maybe there was a successful case – a love story... [laughs] But I'm not aware of that. Most of the time, it was unsolicited attention. In general, moderation is... (23:32)

Alexey: People are trying to promote their services, either in public or in private messages. In private messages, it's more difficult because I don't see that. In public, I can just remove the post but if somebody is trying to send unsolicited promotion through direct messages, it's very difficult to catch that. Sometimes they send this promotion message to me too, and then I know what to do. But they're careful and see, “Okay, this person is an admin. I will not send him a message.” And then send it to other people. If something like that happens, please let Francis or me know about that and we will ban this person, because it's against our community guidelines to send unsolicited promotional messages in DM. I also did a bit of automation with that. When somebody joins the #general channel (when somebody joins the community and immediately jumps to the general channel) and posts a big promotional message, I already have a script (a bot) that removes that and sends an automatic direct message to that person saying, “Hey, it's against the rules. We have these channels. This is how the messages should be formatted if you want to publish something.” So for me, it's just a matter of adding a special reaction to this post, and then the bot handles that. That was fun to create. Also, I'm still trying to figure out... (23:32)

Alexey: What was surprising for me was how eager people were to join the community and participate in different activities, and to also do something in the community as well. That's really cool. I'm still trying to figure out how to recognize and encourage this and reward this participation – reward the participants. One thing [I do] is invite them to the podcast. There is probably more we can do, so I'm still thinking about that. But it's really rewarding to see that people are very active in the community and are doing something for the community too. That's really cool. Thanks for doing that. Thanks Antonis, one more time. I know you're here and you're listening. You're doing a lot for this community – thanks! (23:32)

Johanna: Yeah. I think, in general, people are quite happy to... At least for me, people have an interest, and if they find people that have similar interests, it's just really great. Although, like you said, there might be some, you know... It's always good to have a code of conduct to guide these conversations. (29:12)

Alexey: Yeah. Please don't use DataTalks.Club as a dating website. [Johanna laughs] Tinder works much better. Just try that. [chuckles] (29:32)

The motivation for starting DataTalks.Club

Johanna: Oh, my God. All right. [laughs] Cool. Let's just move on. “What was the motive behind starting the club and the Zoomcamps?” I think we've touched briefly on that. (29:40)

Alexey: We did, yeah. For me, as I said, communities for quite a big part of my life and career. They helped me a lot. I was quite active in this Russian-speaking community called Open Data Science. Sadly, I couldn't find anything similar in the English-speaking space, meaning a community that was also active, that was also on Slack. My initial idea was to create something similar to that. Actually, the story of how it happened is – I was doing some career consultations and a lot of people were reaching out to me on LinkedIn saying, “Hey, I'm a data analyst. I want to become a data scientist.” Or “I'm a software engineer, I want to become a data engineer.” or I don't know, “Help me figure out what I want from life.” Things like that. Back then – it was during COVID, so I also kind of wanted to talk to people. Sitting at home was not fun. (29:57)

Alexey: So I was doing these free career consultations, which involve talking to a person and then sending them a summary. I'm not doing that now. But back then, it was fun. I did like 30 of these consultations and I thought, “It would be very nice to have a place where all these people can hang out and help each other.” Because if I try to help everyone, it's not scalable. I can't help every single person. (29:57)

Johanna: It might also get repetitive or something. Yeah. (31:31)

Alexey: It becomes very repetitive. For example, for “How do I become a machine learning engineer?” The answer is something like, “Go buy my book.” But people might find this answer annoying, because it's very promotional. Anyways – I thought, “If there was a space where all these people can hang out and help each other, it would be more not me helping everyone, but people helping each other – for example, I help somebody, and that person would help somebody else, and then that person in turn, would help somebody else.” (31:35)

Alexey: One day I woke up, I registered the Slack community, I went to GoDaddy and bought a domain (I think it cost like 15 euros or something like that) and then I put a link out... I did not advertise it anywhere. I put this link to my LinkedIn page. I did not make a post, it was just in the description. And I put it in my GitHub. Then I started looking at what would happen. And people were joining it. [Johanna surprised] They would accidentally see this link, or somebody else would say, “Hey, like, I found this cool community. Join it.” In a week, maybe there were a hundred people. (31:35)

Johanna: Wow! (33:00)

Alexey: Yeah. In a month, it was more than a few hundred. I did not actively promote it. Community members would just say, “Hey, there is this cool place. Check it out.” It was pretty motivating. At some point, a friend of mine asked if I knew a place where he can give a talk and I told him to come over and give a talk. It was our first webinar/workshop and since then, we started having regular activities – regular events. I think the other part of the question was about the Zoomcamps, right? (33:01)

Johanna: Yeah, yeah. (33:46)

Alexey: I always wanted to do a course. When I was in this Open Data Science Community, there was a very good course called MLCourse.AI. It's also available in English. I did this in Russian, but it's also translated. It was a free course, with focus on the theory behind machine learning, with a lot of coding, also there was a competition. It was so well-organized. The main thing that I liked about that was that it was community-driven. There was, of course, the project lead (course lead), but also lessons were prepared by different community members, and these community members were helping each other in Slack when the course was running. I thought, “That's so cool. It's such a great thing. The course is for free, people help each other, and people share stories about how they did this and how they found a great job after finishing this course.” I thought “It's so cool. I want to have something like that in our community too.” (33:50)

Alexey: Also, I think I mentioned the book – I wrote a book about machine learning engineering at some point. This book was not selling well. It's still not selling well. It's very hard to compete against other really great books, because there are so many good books about machine learning. Now, in retrospect, I think maybe it did not make much sense to write another one because there were so many of them out already (33:50)

Johanna: Well, since you created the courses, now, they are basically complementing the book, right? Basically. (35:25)

Alexey: Yeah... But I was thinking, “How can I promote the book? What's the best way of doing that?” These two ideas – this course that I really liked and I have a book that I want to promote – they kind of came together. I thought, “How about I make a course based on the book, and then it would help with promoting the book, it will help with promoting the community and it will help do what I want – the data course.” (35:31)

Alexey: I ran a post on LinkedIn or something like that. I did not put any effort into that. I just wanted to check if there is interest. And there was a lot of interest, so I thought, “Okay, I'm doing that.” This is how we did the first course, ML Zoomcamp. As I said, it did not help with promoting the book. I mean, people knew about the book, but not many of them were actually buying it. [chuckles] Because the course is for free, right? (35:31)

Johanna: Yeah, exactly. [chuckles] (36:38)

Alexey: Why pay for it? I don't think I would pay for the book, to be honest. But yeah... And to be honest, I don't get a lot of money from selling the book. It did help, to some extent, with promoting it. Without the course, even fewer people would buy it. But now, actually, some people say, “Hey, I bought this book and use it as a textbook for the course.” Which is cool. (36:40)

How the COVID lockdown contributed to the growth of the community

Johanna: How much do you think that COVID contributed to the fact that it's an online community? Would you have created a meetup or something in Berlin otherwise? (37:04)

Alexey: No. [chuckles] (37:15)

Johanna: Okay, so it's not a COVID baby. [laughs] (37:16)

Alexey: It is a COVID baby. I don't think people would be so eager to join an online community if it was today, for example. The timing was right. Even though, for me, the idea from the beginning was to create an online community, the fact that it was during the COVID helped. Right now, I think during COVID, at the beginning, it was kind of okay, and in 2020, it was okay – in 2021, there was already so much stuff happening online. But this community already had some momentum. There were already community members, the courses, and so on. I think starting during COVID time helped, especially in September. I remember that the summer was fine and people were hoping that normal life would come back, but then in September, cases started to go up again, everyone was locked at home again. It was just the right timing. I don't think it would do as well now – if I started DataTalks.Club today, I don't think it would do as well as three years ago. (37:20)

Johanna: Yeah. Although, on the other hand, I think a lot of communities were created, at least from the podcasts side, because I listen to a lot of podcasts – a lot of podcasts were created during COVID but they all stopped now. So I think it is something special that this community is still alive, because a lot of them didn't survive the COVID craze. (38:37)

Alexey: Maybe if it was just the podcast, it wouldn't be active. [Johanna agrees] We had too many other things. (39:04)

Typical success stories from DataTalks.Club

Johanna: Yeah. So that's part of the business model, right? [Alexey agrees] So many things! Cool. Here's an interesting one, “What typical, not extraordinary, success stories of your students can you share over the last year?” (39:11)

Alexey: Not extraordinary? Oh! Meaning “usual”. Well, we had cases when people wanted to switch their careers and they did. I didn't know what else to say. [chuckles]. For example, just yesterday, one of the podcast guests, Marijn – I interviewed him, and the talk was called Hacking Your Data Career – Marijn Markus. It was like a year or two ago. He reached out to me through LinkedIn saying that he attended a conference PyData, Amsterdam and somebody at that conference approached him saying, “Hey, I listened to your interview on DataTalks.Club and this interview changed my life.” He got so inspired that he switched his career and now he works as a data engineer. Not only that interview, but he also did our data engineering Zoomcamp and he changed his career. Then Marijn reached out to me saying, “Hey, this is such a cool thing to hear, when somebody approaches you at the conference and says, 'Hey! Look, not only do I know you, but also you changed my life!'” [chuckles] (39:32)

Johanna: Oh, wow! That's pretty much the dream. Yeah, wow. (40:50)

Alexey: These things are typical – not an extraordinary story. There were, of course, stories when there was somebody who was a student at school and then did our ML Engineering Zoomcamp, and got an internship. He did pretty well there, but then decided, “Okay, I want to study at university.” And he left. I think it's pretty cool when people earlier in their career, even before the university, already have access to materials like that, can learn something, and then already be ready to do the job. I don't think it's a typical situation. I don't think most of our community members and students are school students. [Johanna agrees] I think, usually, most of the students are more... They're not students anymore. They're already working in some roles and they want to change. (40:56)

Johanna: Yeah. In general, I think podcasts are good. Usually, why I listen to them is because of the inspiration and motivation. It doesn't need me to be, “Oh, I changed my career after that.” But I often wake up in the morning, and I just don't want to work. Then I listen to someone who did something amazing and I'm like, “Okay, I can do this. This person did these amazing things, so I can go and finish this analysis (or whatever).” Sometimes it's even small things. Very cool. I do think that these communities impact so many people in so many ways, which is good. It's really good. Cool. (42:05)

Johanna: What topics or trends in the data world are you most excited about exploring in upcoming club events or interviews? (42:52)

Alexey: The thing with me is, I'm not really interested in chasing hype topics. That's why, only one month ago, we had our first interview about LLMs, which was super late, considering how long ChatGPT has been around. We were super late to the party with an LLM interview. The way we usually do it – I'm more interested in interviewing a specific person, and then see what kind of topics they want to share. Usually, there are a few things anyone can talk about and you, Johanna, know more than anyone because you do these initial calls with our guests to explore what things they can share. For me, it's more about the person, and then figuring out what this person can talk about. Or maybe we have done something previously, but there can be different angles on it. (43:00)

Alexey: That's why I don't know what exactly... I'm not really following trends. Actually, there were two talks about LLMs. It was a coincidence. In both these cases, there were specific people that I wanted to interview and it turned out that, in both these cases, the topic was LLMs – and they were not conflicting with each other. They were slightly different. This is how I like to approach that. But maybe you can suggest some topics. For example, if there is a topic that you're interested in and we haven't done a podcast interview about that, just reach out to one of us and suggest that. Also, if you know a speaker who can talk about that, it would be even better. (43:00)

Johanna: Yeah. And make sure that it really has not been suggested – go through that whole history. [chuckles] No kidding. [laughs] (45:04)

Alexey: It's kind of difficult. I think it's okay to repeat a little bit. If we talked about mentoring three years ago, maybe we can do another podcast episode about mentoring from a different perspective. (45:11)

Johanna: Exactly, because it's always a different person, so it's definitely a different angle. (45:26)

Alexey: I think we would run out of topics by now otherwise. [chuckles] (45:30)

Johanna: Yes, we would. [laughs] I mean, we hear about LLMs from everywhere at the moment, right? (45:34)

Alexey: Maybe it's not a good idea to have two to three more LLM talks. (45:44)

A funny DataTalks.Club story from past experience

Johanna: Yeah, exactly. Cool. The next question is, “Do you have a funny story to share that happened in the last three years of DataTalks.Club?” That question comes from Antonis, actually. (45:48)

Alexey: I was thinking about that. I don't know – for me, the most funny story was that people considered using DataTalks.Club as a dating website. [laughs] I could not even imagine that this would happen. It just did not occur to me at all. Of course, I did not think of putting this on the website, or in our code of conduct – in the community guidelines. It just did not occur to me that it's possible that somebody might use a professional community to reach out to... In this case, it was a male reaching out to female participants. It's kind of funny, but also gross. So, I don't know. [chuckles] But it was memorable. (46:05)

Johanna: Yeah. Usually these people don't join the community for the professional part in the first place. (46:55)

Alexey: I would imagine, yes. (47:02)

Johanna: That's my experience with communities and being approached in that way. But yeah, it's definitely... You never anticipate enough to be... People always surprise you. (47:04)

Alexey: But also, if you're a female participant and you get some attention that you don't want to receive, the knee-jerk reaction would probably be to leave the community. But I would ask you to give us another chance. Report that person and stick around. It's not appropriate behavior and we do not encourage this behavior. We would deal with this person and we want to have a safe space where everyone can learn, share knowledge, and so on. (47:21)

Johanna: Yeah. That's something that I find actually also remarkable. I'm in a lot of communities as a woman – in the more techie space – the percentage of women in DTC is actually quite high. You can always reach out to other women or to me, if you want to [chuckles] or to Alexey.. (47:57)

How Alexey wrote the book on Machine Learning

Johanna: “How did it come that you wrote the machine learning book?” I think we've touched on that one a little bit. Do you want to talk about that? (48:28)

Alexey: Yeah. I see that we don't have a lot of time, so I'll try to give a short answer. [laughs] I don't think I'm capable of giving short answers in general. [chuckles] I'll try. It's not my first book. So the first book... Well, this is a long story. I'll tell you. Some time ago, I was reviewing books for Packt Publishing, which is a publisher. I was still transitioning from a Java developer to a data scientist and they reached out to me saying, “Hey, we found your blog about Java and we have this book about Java. How about you review this book?” I did that – I was a technical reviewer. Then I reviewed another book and another book, and then I said, “Look, maybe you can start sending me machine learning books, because this is what I'm more interested in.” And they said, “Okay. Here's the machine learning book.” They did not even try to check if I knew this topic, which was cool. By then, hopefully, I learned enough of machine learning to be able to help. (48:38)

Alexey: I reviewed maybe five more books about machine learning and they wrote to me saying, “Hey, you're so good at reviewing the books, maybe you want to write one.” So I thought, “Hmm, maybe I do.” And they said, “Okay, it looks like you know Java. You know machine learning. How about writing a book about Java and machine learning?” Back then I thought it was a good idea. Don't do that. It's not a good idea. [Johanna laughs] Because who cares about Java and machine learning? Everyone uses Python. But back then I thought it was a good idea, so I wrote the book. It's called Mastering Java for Data Science, something like that – I don't remember. So that was my first book. Usually, when you write a book, the contract includes the publisher giving you some advance money – an advance payment. Then, with each sold copy, you get some royalty percent. With this book, they gave some advance payment money, and the book still hasn't reached that point when it would break even. It's not selling well. But, I would also expect more promotion from the publisher. (48:38)

Alexey: Anyway, then there was another book that I wrote with co-authors – I wasn't alone there. It was like four or five authors. I did not like that book, that's why don't put it in my CV. It was about TensorFlow projects. It was a book about doing projects with TensorFlow. It was like 10 chapters, and each chapter was a project. But the funny thing is, even though I did not like the book – I did not like the outcome, I did not like the process and I wanted to pretend this book never happened... The funny thing is that people from a different publisher noticed that book and reached out to me saying, “We saw this book. We really liked it.” It was a surprise for me, “Why would anyone like it?” But they did and they said, “We want to have a similar book. How about you write this book?” We decided to come up with this concept that there is a chapter, and every chapter is a project. Then by the end of each chapter, you have a project that you can put in your portfolio. (48:38)

Alexey: So this is how we came up with this concept and I started writing this Machine Learning Bookcamp. Actually, it was their idea to name it that – somebody from marketing at Manning decided that this is a good title. It took two years, I think. Manning is really... They put a lot of effort into making sure that the books are of very good quality. It was very painful – all these processes. They are there for a reason, but for me as an author, it was like, “Ugh, there are 10 more comments that I need to address. I'm so tired. Why did they agree to this?” [chuckles] Finally, this book came out and the rest of the study, you know – how it became a course and so on. (48:38)

Johanna: That's pretty amazing. I didn't know that. But it also seems that your blog was kind of the starting point. [cross-talk] (53:14)

Alexey: Yeah! And it was in Russian. I don't know how they found it. (53:21)

Johanna: Oh! My God. So is that the takeaway? If you want to start with publishing, start with a blog maybe? What I wanted to ask is, at what stage of the publishing process were you reviewing for Packt? (53:23)

Alexey: It was when each chapter was ready – they would send a draft and I would review the draft. (53:41)

Johanna: That's really early then. (53:47)

Alexey: Yeah. It wasn't like, “Okay, here's a book. Review it.” It was as the book was being written, they would send each chapter to me. I would then review it and send it back. For my book, it was a different process. They have in-house technical reviewers who do this for each of the chapters, but there are three or four chapters and it's sent to two people for the viewing – to not in-house reviewers. So the process is slightly different. So it was for work-in-progress books. (53:50)

Johanna: Wow. There's also this Discord channel by Packt, where they have these giveaways where you get books – but they're finished. Basically, you're the reviewer from the public. You can get them, and you can read them, and then write a short review. (54:31)

Alexey: Amazon, right? (54:49)

Johanna: Um, I'm not sure. No, you send it back to Packt, basically. You send it back to them. It's the last stage before it gets published, basically. They solicit feedback from the community, basically. But it's a good way to get books for free. [chuckles] They've done that a couple of times, so that's why I was interested. (54:50)

Alexey: Well, there are good books on Packt. Definitely. (55:15)

Johanna: Definitely. Maybe we have time for one last question. What do you think? (55:19)

Alexey: Yeah. If you have time, maybe we can take a few more. (55:25)

Things on the DataTalks.Club backburner

Johanna: Okay, cool. So then, “Congratulations on your huge success, Alexey. Is there anything you would love to do but didn't have the time to do?” (55:28)

Alexey: Oh, yeah. I think I have a list somewhere with all these ideas that we wanted to try, but never managed to. For example, doing a hackathon is one thing that comes to mind. For example, in the courses, we have a lot of questions and these questions are so repetitive. We have FAQ documents where all these questions are already answered or most of these questions there was an idea to somehow automate it and write a bot. (55:39)

Alexey: But interestingly, even without running a proper hackathon, some community members – one of them is Alex – he came up with this idea, “Hey, how about I do this myself?” When he reached out, it was already a working thing, saying, “Hey, I did this. How about we test it now?” It was so cool. Yeah, it's working fine. He did this for the MLOps course and now he's doing this for the ML engineering course. So it's really cool and it works quite well. The bot would look at the question and then look at all the... Of course, it uses LLMs – it would look at all the other questions in the FAQ document and say, “Okay, for this question, this is the answer.” This typical use case for LLMs, where it has a knowledge base, and you have a question, and you want to retrieve the answer to this question from this knowledge base. He wrote a post on LinkedIn saying what the stack behind that was. It's a really cool project. So I wanted to make a hackathon but never managed, but then... It just happened. [chuckles] (55:39)

Alexey: Another thing is – all these events we have, we use Eventbrite. I don't really like Eventbrite – it's not the most convenient platform. But the cool thing about Eventbrite is that it stores the emails of each person who registered. What we can do with this is build a recommender system that says, “Okay, this person would be interested in these events.” For example, when there is a new event, we potentially can see which community members would be interested in this particular event, and then send emails or somehow notify them about this. Of course, we need to think about opting out, opting in to that – but as an idea, this what we have. (55:39)

Alexey: In addition to that, I don't really like Eventbrite, so maybe creating something in Django for managing events would be another thing. It could be another hackathon project. But then, with so many things already happening, it's always hard to find time to squeeze in one extra thing. Maybe, if any one of you is listening to this and you like this idea of a Hackathon, maybe you can reach out to us and we'll see how we can make it happen. (55:39)

Johanna: Yeah, hackathons are amazing, but they're just epiphany of time-intensive, right? You sign up for a couple of days of your life, if you want to do it like that. But they're very cool. I've done several hackathons and I love it. But yeah, it's definitely easier when you're a student because you can just be missing. (58:51)

Alexey: Maybe we can do one more question. (59:17)

Evaluating the success of DataTalks.Club

Johanna: Yep. “How do you evaluate the success of the efforts of the initiative and purpose?” (59:19)

Alexey: There are a few metrics that I look at. In terms of bringing money, the most effective thing is our newsletter. So if you're subscribed to our newsletter, you probably saw that there is a sponsored block in the newsletter. This is how maybe 50-60% of money that we earn at DataTalks.Club comes from. For us, the number of subscribers who open the email is an important metric because it directly translates to money. So this is what I look at. When it's growing and people keep opening the email, despite the fact that this is the sponsored block, that's really good. This is what I personally look at. (59:32)

Alexey: Then, of course, the number of people in the community. Right now, it's almost 40,000. But also, the other interesting metric is the number of active people. Of course, not all 40,000 are active. Somebody comes for the course, they take the course, and leave the community – which is totally fine. Or maybe somebody comes and finds out, “Okay, I'm not really interested in this community,” and leaves. This is also fine. But looking at the number of active people is another metric that I look at. Also, engagement in social media – this is something that I think is quite important because social media helps us bring new people to the community. YouTube views – if the content is interesting, a stream usually has more views on YouTube than non-interesting content. Then we get the signal to see what really resonates with the community. (59:32)

Alexey: Again, another big thing is, “Are sponsors willing to give us money? “Do they find this community attractive enough for them to actually give us money?” That's probably the main success criteria, I'd say. If they do, then we're doing something useful, because there are companies who want to support us. (59:32)

Johanna: Yeah, very cool. I think we have maxed out the hour. There are two more questions left, but maybe you can answer them in Slack or something. (1:01:50)

Alexey: Yeah, maybe. (1:02:05)

Johanna: Yeah. Cool. That was really super interesting. I've been around for quite a bit, but I've learned a lot. I hope this was interesting to the community. Thanks, everyone, for submitting your questions and for participating. Next time, you will see Alexey again in the host chair, and maybe behind the scenes, probably. (1:02:06)

Alexey: Totally. We probably need to invite you for another interview. (1:02:34)

Johanna: Oh, yeah. [laughs] I can talk about... (1:02:37)

Alexey: It's been a while. [chuckles] (1:02:39)

Johanna: It has been a while. And I can talk about the liberal weed policy in the Netherlands. [laughs] (1:02:40)

Alexey: That would be unusual but interesting. Thanks, Johanna, a lot for joining us today and being the host for this interview. It was very fun. (1:02:49)

Johanna: It was a lot of fun. Cool. All right. (1:02:57)

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.