MLOps Zoomcamp: Free MLOps course. Register here!

DataTalks.Club

Shifting Career from Analytics to Data Science

Season 3, episode 2 of the DataTalks.Club podcast with Andrada Olteanu

Did you like this episode? Check other episodes of the podcast, and register for new events.

Transcript

Alexey: This week, we will talk about career transitioning from data analytics to data science. We have a special guest today, Andrada. Andrada worked for two and a half years as a data analyst. Then she decided to make a career switch and she became a data scientist. Today, she will share her experience with us. Welcome. (1:37)

Andrada: Hello, everybody. I am super glad I could join. I hope you are going to get something interesting from this. (2:02)

Andrada’s background

Alexey: Before we go into our main topic, so let us start with your background. Can you tell us a bit more about your career journey so far? (2:10)

Andrada: Yeah. I did my undergrad in statistics. During that time, I wanted to get a little bit of domain knowledge hands-on. I was a data analyst at Avon Cosmetics in my country, in Bucharest, Romania. Afterwards, I did a Masters in Data Science Analytics in the UK. During that time, I found my passion on Kaggle. Afterwards, I came back and I started working at Endava as a data scientist. That is about it. (2:21)

Alexey: Now you are in Bucharest? (3:02)

Andrada: Yeah, I am in Bucharest. It is super sunny and I am super excited about it. (3:05)

Alexey: Does it often happen that you have sun? (3:10)

Andrada: Yeah, but in the UK, it was not the case. (3:12)

Alexey: You were studying statistics, you were working in data analytics. How did you realize that you want to go into data science? (3:18)

Andrada: It was a multitude of factors. The main reason was a girl in my statistics class who was talking obsessively about data science. I was a data analyst and I was doing my thing and I liked it. I liked it a lot actually, but she was talking a lot about data science. I was also watching some machine learning articles. I was like, “Okay, what is this?” It was so confusing to me. Afterwards, I started researching. During that time, I also knew I wanted to do a Masters. Because I already was working at Avon and I was a data analyst, I was like, “okay, I want to do a Master's a little bit deeper, not on what I already know.” Then I will start doing data science. Because I wanted to get ahead and not be a super noob when I went into that Masters, I did not know how hard it was going to be, I started some courses on my own — on basics of data science. Then I discovered how fantastic that is, and I started enjoying it more and more. Because I felt like I already knew some data analytics, although it might not be the case, I went into data science full-on and I have been doing it since then. (3:32)

Alexey: Do you remember which courses you took? (5:21)

Andrada: Yeah, very vividly. The first thing that introduced me to data science — and I highly recommend it to anybody who did not do Python, who does not know about machine learning. I only knew about regressions because I did them in statistics, linear regressions. I highly recommend the “Python for Data Science and Machine Learning Bootcamp” on Udemy. It is from Jose Portilla. He is absolutely amazing. He did these courses so nicely and he teaches you — okay, this is anaconda, this is how you start a notebook. Let us start with Python. Now I am going to explain to you some Pandas and some NumPy. Okay, now what is linear regression? Now I know you did not come only for a linear regression, so take some random forest or decision trees and so on. At that moment I understood what data science is and I had my foundation. Afterwards you can build on whatever you want. But that was the foundation I really needed. (5:23)

Alexey: Did you do this course during your undergrad studies in statistics or something later during your Masters? (6:48)

Andrada: I did them before my Masters. It was the summer before my Masters. I was still working but I wanted to prepare a little bit. (6:56)

Alexey: So, there was a girl in your class and she told you “okay, there is this thing”. You started to read more and more about that. Then you finally found that course on Udemy and you did that course and you understood “okay, this is a super cool thing”. Then you decided “okay, I want to spend more time doing that” and you decided to go and do a Masters. (7:07)

Kaggle and StackOverflow

Andrada: I guess Kaggle was a big factor as well. It is the community and the comments and the medals. And the fact that you can interact with somebody. You post on StackOverflow… once I posted and they downvoted my post because it was like, “oh, you did not put the question right”, although I put a disclaimer there “I have not done this before, sorry”. I have never asked a question on StackOverflow again. Whereas on Kaggle, you can just ask and people are so nice. Kaggle was a big factor as well. (7:31)

Alexey: How did you discover Kaggle? You were doing this bootcamp course on Udemy. Did you find Kaggle after the course, during it, or during your Masters? Or when? (8:18)

Andrada: My first interaction with Kaggle was... Somebody told me [about it] during my undergrad studies. And I entered. I did not understand anything, what was going on… This is a spaceship, I am out. Afterwards — and this is why I am saying this was a big foundation — during that Udemy course, at some point or towards the end, he was like, “you can practice more on Kaggle”. I was like, “oh, okay”. I remember that spaceship I was so frightened about. So, I naturally went into Kaggle… I knew it was complicated. I did not want to do anything with it, just the courses. Give me the courses. I want to understand more and more. Afterwards, because they tease you from the courses into a competition, I entered the world. This is how it happened. (8:32)

Alexey: And what happened then? (9:38)

Doing notebooks on Kaggle

Andrada: Then, I did not even understand what I was doing… I did not even understand what I was doing when I was sharing the notebook. I remember the only thing I wanted to do was to improve my model. It was Iowa House Price Prediction which is the natural progression from the machine learning courses on Kaggle — which are, by the way, absolutely amazing and are, in my opinion, were best after that course on Udemy. Because I had my foundation, but then I found out how to put some new questions to improve my models and make them better. (9:43)

Andrada: I started trying to improve the models. Then, because I wanted to understand more, I learned visualization. Now I am going to do a notebook on visualization. My only purpose of doing the notebooks was just to use the data sets on Kaggle. They were interesting to practice my skills that I wanted to gain on Kaggle. I was not posting to gain medals or something. I did this notebook, which is highly appreciated, although I do not understand why — because it is so basic.

Alexey: Maybe that is why. It is basic and simple and people can just open it and understand what is going on. I remember that on Kaggle, sometimes there is a notebook that gives you a good score, but you have no idea what is going on. (11:24)

Andrada: It is for smart people, very very smart people. (11:43)

Alexey: Maybe that is why — because it is so simple. People can understand and relate and really follow through and see what is going on and learn from this. So, you took some courses on Kaggle. Then they teased you to try competitions. The competition was about predicting prices for houses, and you did a notebook there and it became popular? (11:47)

Andrada: No. That one has 20 upvotes after one year and a half or something. No. But, in my opinion, it was pretty good. I jumped, I became obsessed on the leaderboard and I jumped at some point after some hardcore hyper parameter tuning. I remember I stayed one day, nine hours straight. That is why I put parameter tuning. It was so exciting, because the error was dropping. I was like, “Oh, my God”. I remember, I went to the top five percent or three percent or something. It was super addictive. (12:18)

Andrada: But what happened afterwards, I started to try to gain more knowledge on multiple things. I remember, I did not have any knowledge on Pandas and how to manage data frames. It was taking me a lot of time to do anything. Visualizations, everything. It was taking me a long time because I was a beginner. I did that Brazil Fires analysis, which was just that. It was the moment when there was that problem in the world. Also I wanted to experience on a data set that I was passionate about. That one, that notebook, I remember, I did summer in September last year. I never went to Kaggle until late October — I was doing my Masters. When I went back to Kaggle, although I already had like two or three notebooks and no upvotes whatsoever. How do people get these? I do not understand. I had the silver medal or something, that was it. Do people see this? What? And that is when actually everything started.

Projects for learning data science

Alexey: We talked quite a lot about Kaggle. As I understood, that was one of the main drivers for your learning. You picked up Pandas, some other data wrangling skills, visualization, exploratory data analysis. When you learn something — the important thing is to do projects. Would you say that for you, the projects that you did to learn data science were mostly coming from Kaggle or you also did some other projects to learn it? (14:26)

Andrada: 90 percent of them came from Kaggle, the other 10 percent were my dissertation. A few other projects that I had in my Masters and then I translated them. Two of them, I actually translated them into Kaggle. So, still Kaggle. (15:05)

Alexey: How did you translate them? You were writing your thesis, you said. How did you put this to Kaggle, just uploaded your notebooks there? (15:32)

Andrada: Not really, because some of them were data sensitive. For example, I had this amazing module in my Masters that was hands-on and we had three projects. Two of them were in teams. We would change teams after each project, which was exhausting. But they wanted to push us to interact with as many people as possible. The last one was solo. What happened, from the first project, it was on audio files. That project, I use heavily in the birth call competition on Kaggle. (15:42)

Andrada: The second one was on fraud detection. The data they used was from Kaggle but they altered it a little bit. What I did was I just translated what I have done in that fraud detection challenge, to Kaggle using the data set on Kaggle. So not the one they provided. The last one… I cannot even remember what the project was about. Oh, the fraud detection was the last one. The one in between was the one with sensitive data that I could not use. It was super on optimization, it was super complicated. For my dissertation, I did sentiment analysis on the bike sharing scheme in Belfast. I was in Belfast and I learned so many interesting things.

Andrada: What I did is after I finished… I remember the person who actually hired me, who is a grandmaster as well on Kaggle, Gabi Preda… Shout out to Gabi Preda. He is absolutely amazing. He texted me on LinkedIn and he said, “have you seen that dataset on Covid’s tweets?”. I was like, “yeah”. “Do you want to do a sentiment analysis on it?”. And I was like, “Oh, yeah!”. Then I translated everything I learned in my dissertation to a fun, interactive Kaggle notebook on that data set.

Finding a job and a mentor with Kaggle’s help

Alexey: So, you found a job because somebody from Kaggle reached you? What was the name? (18:09)

Andrada: Gabi Preda. (18:23)

Alexey: So, he found you on Kaggle? (18:24)

Andrada: I think it was more from me than from him. Kaggle was full of opportunities for me. When I started, I never realized it. This is why I encourage every time... And I think people are getting bored of it. If you listen to talks from me, I am a Kaggle maniac. Just because it brought me so many opportunities. My job was one of them. What happened is at first, when I started doing Kaggle, I was not used to any Romanian names. When I found the rankings, it took me half a year… Kaggle is super complex. I guess I was not curious enough. When I went into rankings, I wanted to take a look at people, especially from the top, and follow them. (18:32)

Andrada: I recommend this to everybody — follow people on Kaggle, on LinkedIn, on Twitter or on anything. If they are in the top 200 people, it means that they are super passionate. It might be that you have something to learn from them. When I saw the top five or six or something, a Romanian name, I was shocked, I was like, “Oh, we can do this too? What?” I followed him. I was super chill, I did not text him or anything. I was a contributor at that time. I do not know how he remarked me, but at some point, he followed me back and he left me a super sweet comment, super genuine comment on one of my notebooks, it was a sentiment analysis or Rick and Morty tweets, which is one of my favorites. He left a comment there and I remember I was so excited. I was super happy about it.

Andrada: Afterwards, when I realized I wanted to come back to Bucharest, when I started searching for jobs, I texted him as well. I would also recommend this to anybody, because finding a job, you need to apply to 100 to get back five responses. And maybe one is going to say “yes”. I would recommend to text to anybody. I knew he was elite… Gabi, I hope you are not listening to me.

Andrada: I really really wanted a mentor. The data science field is changing a lot, fast. You cannot be a person that is not open to change to be a good data scientist. Because if you are using BERT today, in two years it is going to be obsolete. Maybe BERT is not a good example but just simple vanilla neural networks, they are not used anymore. You use more complex algorithms. Or you are not using regression, you are using XGBoost or random forest or whatever. So, you need a person who is open, who likes it, who reads, who is passionate about data science. And I knew he was, just because he was active on Kaggle. He liked it. I saw him, he had an interest in it.

Andrada: So, I was thinking that I really wanted to find a job... It was not the job I was looking for, it was a person that would teach me things. I had a lot of interviews with people that I did not… I could have gone further but I just stopped them — just because I knew if I get this job, I will not like this person and I will not be able to work with him. Yeah, so I found Gabi on Kaggle. I am super lucky that he saw opportunity in me and I am super grateful for that.

The process for looking for a job

Alexey: Maybe we can talk a bit about this process for looking for a job. You said that you need to apply for 100 positions to get just one job offer. For you, how many jobs did you apply to? (23:25)

Andrada: It was not 100, because it took me a long time. It depends a lot. You do not need to go into looking for a job thinking that it is going to take me a long time, because it might not. But in my case… To give you a comparison, I am the struggling one, usually. I had another colleague in my Masters, she is super smart. She found the job much quicker. I started looking for a job in June and it took me until October to get the proposition. She, on the other hand, started looking in the beginning of August and in late August, she already told me she was having an offer. So, it depends. It is hard. You are looking at other people and maybe they are more lucky. But you need to be resilient. I did not apply to 100 jobs just because… I remember I was having a habit. Every Friday, I would stay for two or three hours to apply. I could not find 100 jobs to apply to because it was Covid. (23:41)

Andrada: I was trying in Bucharest which is a metropolitan area, it is the capital of Romania. I was also trying other countries — to be remote. There were not many positions for a junior data scientist. Everybody looks for a junior data scientist with a senior knowledge but the salary of a traineeship. I did not fit. It took me a while and I had many rejections as well, which hurt.

Main difficulties of getting a job

Alexey: Sorry to hear. What were the main difficulties for you? You said you started in June and you found it in October. So it took five months. (26:07)

Andrada: Yeah. (26:18)

Alexey: What were the main difficulties for you in this process? Is it a lot of competition or just people do not want to hire juniors to work remotely or…? (26:19)

Andrada: The main difficulty, and it still is a big difficulty for me is the coding part. In all the interviews I had — besides the one for Endava, with Gabby — all the other interviews I had were testing me on algorithmic coding. I had an interview for this company, I am not going to name it just because it does not matter, and my partner was having an interview exactly for the same company. I was having it for Romania but he was having it for London. It does not matter but it was the same company. And we had a part of one day. He had a coding interview and I also had a technical interview, not really a coding. He is in cyber security. He had an easier algorithm than I had, which I was applying for a data scientist. I was like, “what is happening?” (26:27)

Andrada: When they tell you, “Okay, try to solve this”, and they watch you… And I am like, “I have no idea what you are telling me, can we please talk about data sets, can we please talk about projects?” I was struggling a lot just because I was doing a lot of algorithmic coding online with exercises which I hate. I genuinely hate because these are interviews, whereas for Endava I just had a normal conversation. They already knew that I knew data science and I know — at least roughly — how to code because of my projects on Kaggle. The only thing we were talking about ”Okay, how would you solve this problem? What about this? What projects have you been working on? How did you solve it?” So, it was not “I am going to see if you know this algorithm that you need to know by heart or I am not going to hire you”.

Alexey: So, most of the interviews were of this sort… One famous example I often got in the past was — you have a string with opening and closing brackets and you need to find out if it is balanced or not balanced. This kind of stuff that you do not need at work at all. It is more like a brain teaser. So I guess most of the interviews, they were asking for these kind of things. (29:25)

Andrada: They put me to create a class that would receive a… I am trying to remember the problem correctly. It is a popular one, if you try to find it on Google, you will find it, but I cannot remember the name. It is the thing that you have a tree and on any node you give, you want to get the other nodes that have the same neighbors. It was something like that. And I do not even work with classes. I usually do not. If you know a few examples, you can make them, but I was like, “I have no idea how to… can you…?” So, it was very difficult for me. Usually you finish these interviews negatively impacted, being like, “oh, I am super stupid, I should not be doing this, I should not be doing this”. Nut it is not true. (30:00)

Andrada: The other interview that I had, it was for Pandas. But they did not let me google, which I think is something stupid as well. Because you google a lot at work. It was on Pandas but they put me to do this weird long multiple pandas functions to get something, a filtering of some sorts from a table. I managed to get almost to the end. And I remember the guy being like, “oh, no, you should know this, this is super simple”. When they tell you it is super simple, you are feeling more stupid. In my opinion, they are not realistic.

Alexey: That must be a very frustrating experience. (32:07)

Andrada: We all go through this. (32:12)

Project portfolio and Kaggle

Alexey: What I heard is, your visibility on Kaggle and your passion — because you are super passionate about Kaggle and the community there — you have visibility there, people know you, people see your notebooks, people see what you do there. For people in the community, they recognize you, and it helps to get a job. I guess this was one of the factors how you managed to eventually get that one offer then from this company. (32:14)

Andrada: It is better than only to have a CV, to be like, “I know Python” — to also be like, “I know Python. Look at my Kaggle profile.” (32:55)

Andrada: “I use Python, I am familiar with Pytorch” — It is just the CV. Whereas this other guy who is like, “I know Pytorch, look, these are three projects on Kaggle that I did using Pytorch”. I do not know how familiar are companies with Kaggle but I think they are starting to. Having a portfolio is super important rather than only having a CV. CV is resume. Sorry, resume.

Alexey: I think it is the same thing. Okay, it can be Kaggle, it can be just a GitHub profile. (33:37)

Andrada: Anything. (33:47)

Alexey: Home page with projects or blog posts. The good thing about Kaggle is you have a community element there. If you just write blog posts, maybe nobody sees this blog post. But on Kaggle, you have this community, you also have this social element, where people can follow each other. Every time you create a notebook — I do not know how many followers you have on Kaggle — but all these people see your notebook and then they rush and upvote the notebook and… (33:48)

Andrada: I do not think the followers see it. I have many people that I follow. I do not know how the algorithm works but on the feeds, I do not see them usually. But yes, it is a big interactivity. If you are doing some amazing work, people eventually are going to follow. I have many notebooks that I am not that proud of, that are super super upvoted. I have other notebooks that are my pride and joy truly, they have very little or almost no upvotes at all. So, it is the thing that you want to create. Upvotes matter and it would not be true if I would say they do not matter at all, they matter. But it is not good to start working on Kaggle with that goal in mind, because you are not going to get far by looking at the numbers. (34:17)

Alexey: We have a related question. So, why do you prefer Kaggle over GitHub for showcasing your project's work and code? (35:30)

Andrada: I guess just because I am more familiar with it. I link GitHub with coding and with libraries, whereas I link Kaggle with projects and with community and showcasing some algorithms or results. This is why I thought at some point to move my notebooks to GitHub as well. But I do not think it is going to do me any good or any more justice. But there are people that have quite amazing GitHubs. If you are the person that makes life easier, doing functions and creating libraries or amazing stuff, go ahead and do GitHub. (35:41)

Helpful analytical skills for transitioning into data science. Check the data!

Alexey: When you were making the switch from analytics to data science, the biggest problem for you was coding. Especially all these coding interviews that are difficult and stressful. Sometimes they say, “oh, it is easy” but for you it is not, and it is demotivating. But there were also things that you did as a data analyst that were helpful for you? I do not know, some data wrangling skills... What were the things that were useful for you as a data scientist now? (36:41)

Andrada: I guess the most important thing, it was the process of solving a problem. I did statistics. At work, I was doing data analytics, which are not quite hand in hand, but these two as a mix gave me… I remember the most important thing I learned from Avon was triple, quadruple check whatever you did. You did a group by, check the data to see if everything is fine. Because I sent some reports that were quite messed up to 60 people and then 60 people came back to me being like, “no, you are wrong”. Then I sent them back being like, “look, now they are good. Then 60 people came back to me being like, “no, you are wrong again”. (37:29)

Andrada: So, triple check the data — which is also super important in machine learning. Because it can have data leakage, you have missing data that needs to be tackled, you have variables that you need to understand. You need to have business knowledge and domain knowledge which also is super helpful. From statistics, I remember the distributions, understanding the data, understanding what is a tabular data set — not really a data frame but a tabular data set, understanding KPIs. These all are super important within the data pre-processing and analyzing part.

Andrada: Now that I am talking, I realize maybe this is why I like it so much rather than hardcore modeling. I like that part as well but again that part needs a little bit more coding expertise that I am struggling to gain. As I move forward, I want to gain more but it is just an area that for me does not come as easily as exploring. Losing yourself into the data and understanding things that are super super interesting — correlations as well. These particularities and subtleties.

Andrada: Now there are many people that come to me and ask, “but is not this the harder part and the coding is the easy part?” Actually, I am participating in some courses now where the class is talking to the teacher and they are like, “Can you please teach us more coding and less slides?” He comes back and says, “Well, you will see the slides are more important than the coding”. Now, I disagree, because — this is my opinion and this is my background talking, maybe a software engineer is going to be disagreeing with me as well. We can agree to disagree. But in my opinion, it is easier to understand the concept. You do not really need to go so deep into mathematics, only if you want to do research, then you will need to. But just to use the algorithms, you do not need to go so deep into mathematics. Whereas the coding part, you need to know how to do stuff. It does not matter if you know the idea, you need to understand how to do them and it works now as well. The most struggling part is when I know what to do… it happened to me these days. I know what to do, I understand the problem. How do I do this? And it takes me a little bit longer, which is annoying but I am going to get there.

Becoming better at coding

Alexey: I see that in other data analysts, who are pretty good at understanding data, getting insights from the data, querying data, doing data manipulation and calculating some statistics, doing exploratory data analysis. But when it comes to coding, it becomes difficult. I know you are still going through that and learning that. Maybe you can share some plans like how you want to do this? How do you want to approach that? Are you taking some courses to improve that side or…? (41:49)

Andrada: At the moment, I stopped completely doing courses. I found out through experience that the best way to learn something is to do projects — exactly like you said. What usually is the plan is to find a competition. I cannot wait for an NLP competition actually. Kaggle, can you please give us an NLP competition? I remember last year when I was like, “no, no, I am not doing NLP, only computer vision”. They had a bunch of natural language processing competitions. And now they do not have any and they have a lot of computer vision. What I usually do is I focus on a subject or something that sounds interesting. Then, I think 80 percent of times, I join a competition without having any idea how to solve the problem. No clue whatsoever. Then the plan is to study very very well a few notebooks. Because, and I say this a lot in my notebooks, the baseline is from other people. It is not mine. (42:33)

Andrada: I feel like sometimes people are ashamed that they are not original. You cannot invent the wheel. If you are learning, you will need to practice. You can practice from these people that are super knowledgeable. So, take their code, it's fine. Twist it and do something and then build upon it. And say, “This is the baseline from this person, amazing, thank you so much”. Give them credits. I take the code from somebody, I try to explain it very very deeply — because knowledgeable people on Kaggle, although they explain a lot, they do not explain enough. I need to get even deeper and to start coding myself, rename the variables — to understand them. Afterwards, after I understood everything, I can start trying to build upon it — research and look at many other more Kaggle notebooks. This is the plan and it has been working pretty well.

Learning by imitating

Alexey: So, you take a notebook that somebody shared, you try to decompose it, understand what every line is doing, you try to rearrange the code, you try to change the variables until you understand every bit of… every line of code, every letter there in the code. Then you start to improve that code, try a new model, maybe do something else there. (45:16)

Alexey: We have a question from Gerth. What do you think about learning by imitating? If we try to imitate and then just copy other notebooks. Do you think it is a good way? I think you just answered that, because this is what you are doing. Maybe not imitating but taking what others did and trying to really decompose it into simplest pieces and then understand each piece separately — and then have this big picture.

Andrada: I recommend this a lot. Do not fork the notebooks, because you already have the code. Start a new notebook and then have the other one that you are imitating, on the left side or somewhere else to look at it. Then take each line of code one by one and write them yourself — because this is practice. It is not just reading. You are actually writing them. It is going to imprint in your memory. If you are doing this five times, you are going to know every single letter in that notebook. This way, you can start printing, you can start adding or subtracting — because you want to really understand that. You want to take it with you to dirty your hands. This is what I am trying to say. (46:16)

Alexey: How much time do you spend on decomposition? I imagine that is not five minutes. (47:41)

Andrada: Few days. A few days. (47:48)

Alexey: A few evenings, right? (47:50)

Andrada: Yeah. Well, it depends but it can take from six, seven, eight hours to… I had an error, it took me a while to debug. That took almost five full days, but I ended up with a pretty nice notebook afterwards. (47:53)

Alexey: Yeah. I also try to follow something similar. I am not active on Kaggle anymore, but I was at some point. I was trying to do something similar. I was trying to decompose a notebook and then I thought “okay, now I understand what is going on”. Then I rerun and my model is two times worse than the original one. “Okay, what did I change? When it stopped working?” The code in my head, it is the same. (48:21)

Andrada: It happens to me as well. (48:51)

Alexey: Then, I would spend many many hours trying to understand what I broke. Then, what I ended up doing — “okay, I am throwing away this notebook. I am starting from scratch”, Then executing every cell and changing a tiny bit and then making sure the results stay the same. That takes a lot of time. Many days. (48:53)

Andrada: I can feel your pain. I can relate, I know. But this is how you learn the best. (49:18)

Is doing masters helpful?

Alexey: If we go a bit back to your Masters. We have a question from Zach. Do you think your masters was helpful for your career? (49:27)

Andrada: Everybody have their own experience. My Masters put some sort of arrangement. When you start data science, you are like, “Oh, machine learning, okay, deep learning, computer vision, natural language, XGBoost, what… PCA, what is everything?” It is extremely wide and broad. You do not know what to start with. My Masters really helped me decompose that knowledge and start step by step. I remember when I was doing my statistics module, I was just finishing and understanding a little bit of how to process the data. Then I went into understanding Python. Then I started doing more and more pre-processing on Kaggle, Then the data mining module was about machine learning and I was doing more machine learning on Kaggle. I was pairing whatever I was learning. (49:58)

Andrada: However, unfortunately, for me, and this was a bummer really — I was doing so much Kaggle, I was learning so much on my own, I did not quite get more from my Masters. Besides learning R. I need to give them that. But besides that, unfortunately, even deep learning, I learned from YouTube. It took me two months on YouTube — and again, printing everything and so on. Unfortunately, they did not manage to give me more.

Alexey: I followed a similar path to you, even though my background was in software engineering. I also decided to do a Masters. But eventually, what helped was online courses and Kaggle — to really understand machine learning. Masters was still useful, like you said — they explained where each piece of the puzzle belongs. (52:15)

Andrada: I would repeat it if I would be again in the same place. I would repeat it 100 percent. (52:46)

Getting into data science without a masters

Alexey: What I am trying to ask — for people who go from analytics to data science, it is not always possible for them to do a Masters, to spend one year doing that. Was it one year for you? I think in the UK, it is one year, right? In continental Europe — at least in Germany — it’s two years. Not everyone has this luxury to stop working for two years and completely immerse into that. Maybe you have some recommendations for them. What would you do if you could not do Masters and you had a full-time job as a data analyst? Would you follow the same path? Do this course on Udemy, do courses on Kaggle, then try to get into notebooks and all these things we discussed. Or would you suggest something else on top of that? (52:54)

Andrada: Yes. Exactly what you said. I would not suggest anything else just because I am afraid to suggest some things that I did not do. It would not be appropriate. However, if you start from scratch, if you do that course, you can safely go on to Kaggle and start from there. You do not really need that course. You can start from Kaggle straight up. For six months maybe to one year, every evening, three days a week, you emerge a few hours into the data science part. At the end of that year, you will have knowledge, you will have to show something — which is super important, you will have community — if you are engaging with people on LinkedIn, on Twitter. (53:57)

Andrada: On YouTube also, Ken Jee, who is one of the ambassadors for ZBook HP and Nvidia. He is awesome by the way. He is doing the 66 days data science challenge. You can go and follow him. You do not need a teacher to tell you what to do. There are lots of resources out there. You can pivot from data analysis to data science in maximum one year, if you are relaxing and taking your time and enjoying the process. Maybe one year.

Alexey: And then this aspect of going through coding interviews — that is probably the stressful part, but you have to go through this. There is no way around that. (55:54)

Andrada: Unfortunately, no. (56:11)

Alexey: Maybe become a Kaggle grandmaster? (56:12)

Andrada: It does not help. For me, it did not help. They still want to stress you and put you under pressure. (56:15)

Alexey: You also need to find the right people. This is where the networking aspect comes into play. You have that aspect — networking on Kaggle and other social platforms like LinkedIn and Twitter. You connect to people and then it helps for them to see you, to see what you do. Then they already know you, they already know what you are capable of — they can see your notebooks. That probably helps with, to at least start the conversation. (56:27)

Andrada: Yeah, exactly. (56:59)

Kaggle is not just about competitions

Alexey: One thing we did not talk about. We talked that you are a Kaggle grandmaster in notebooks. Maybe you can spend a couple of minutes talking about that? For many people, when they hear Kaggle, what they imagine is — I think you mentioned that once — a spaceship of models. All these models put on top of models on top of models. (57:01)

Andrada: Like a battens. (57:27)

Alexey: But Kaggle is not just about that. You do not need to focus just on competitions to learn from Kaggle. What you did — you followed the notebook tier. You do not have to focus on competitions — it is difficult, right? It is difficult to get a gold medal in competition. You need to spend so much time there. But you do not have to do this to learn things. (57:28)

Andrada: Besides discussion — they started putting medals in discussions because there were some people that were very helpful in their comments — I completely understand that tier. You can learn from any of the three tiers. Competitions of course, just because you are training [models], you are looking at the leaderboard. Notebooks, again, because not all my notebooks are on competitions, but you can do notebooks on competitions to understand the process better. You can also do them on different topics that you like or something maybe you are passionate — like I was about sentiment analysis. Do a number on sentiment analysis, see how you can gather the data. (58:15)

Andrada: Talking about data and datasets. I feel like they are a little bit underappreciated, but they are super important. I told you about Gabi. Gabi Preda just became a dataset's grandmaster. He is three times grandmaster now on Kaggle. He scrapes the information and he updates it, which is an effort on its own. But the idea of scraping the data and making little scripts or big scripts — I do not know what he is doing or how advanced he is — but just gathering the data and putting it in a clean format — it is a skill on its own. Datasets, I feel like they are the hardest to gain a medal in, just because it is very hard to put out a dataset that is important and very valuable to the people.

Andrada: If you want to be a data engineer… I know a guy who wants to be that. He wants to start Kaggle only for datasets. He likes to take clutter and make it neat. He is a clean freak or whatever. So, you can learn in any tier.

Alexey: Thank you. Anything else we did not cover what you wanted to share? (1:00:54)

Andrada: I do not think so. (1:00:58)

The last tip: use social media

Alexey: Any last tips? (1:01:00)

Andrada: Yeah. Use social media. (1:01:02)

Alexey: Which one? Twitter, LinkedIn? (1:01:06)

Andrada: Both. Heavily. (1:01:09)

Alexey: How should we use this? Let us say, somebody's looking for a job. What should they do? (1:01:13)

Andrada: Try to build — and this takes a lot of time — but try to make your social media, especially LinkedIn and Twitter, as focused on data science as possible. Try to showcase your work there, share data science things, and everything that gathers a community and builds a community. Because there comes the opportunities as well and there comes more knowledge. You meet more people. Having a community is extremely important. This may be more important than being very good at something. You need to have the community. People do not talk enough about this. So use social media to showcase your work, to show that you are passionate. Be kind, be gentle. Thank people that helped you. Super important. Try to add more to this community, try to be a bonus, not somebody who just uses all that information. Give and share, and the world will be a better place. (1:01:22)

Alexey: That is a great way to finish today's conversation. I guess that is all for today. Thanks a lot for joining us today and sharing all this experience, being open about all the struggles you have had. I think many people who are going through a similar transition now will appreciate that, especially this interview part. I know it is not easy, so thanks a lot. Thanks everyone who is listening. And yeah, drop by next week when we have events. (1:03:01)

Andrada: Yeah, and subscribe. (1:03:50)

Alexey: Yeah, of course. (1:03:53)

Andrada: I always wanted to say that on YouTube. (1:03:54)

Alexey: So, subscribe and press the ‘Like’ button. Exactly. Likes, comments, subscribe. It was nice talking to you. Have a great weekend. (1:03:58)

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.