Machine Learning Zoomcamp: Free ML Engineering course. Register here!

DataTalks.Club

Advancing Big Data Analytics: Post-Doctoral Research

Season 6, episode 5 of the DataTalks.Club podcast with Eleni Tzirita Zacharatou

Did you like this episode? Check other episodes of the podcast, and register for new events.

Transcript

Alexey: This week, we'll talk about doing postdoctoral research. We have a special guest today, Eleni. Eleni is a postdoctoral researcher at the DIMA Group at TU Berlin, which is the same university where I got my Master’s like six-seven years ago. This is the same group where I started and Eleni is working there as a postdoctoral researcher. She's doing research on spatial big data analytics and she's interested in stream processing in the IoT environment. She published her research in many data management venues like VLDB and others. One of her papers received the ACMC SIGMOD best demonstration award. Is that right? (1:13)

Eleni: Yes, it was a demo paper. This is a special category of paper where you actually demonstrate or showcase a technique or a system that you have developed. You go to the conference and you actually have a laptop and show how it works live. (2:03)

Alexey: We'll talk about that as well. Eleni also holds a PhD in computer science from EPFL. Welcome. (2:23)

Eleni: Hello. Thanks for having me. Happy to be here today. (2:31)

Eleni’s background

Alexey: Before we go to our main topic of doing research, I wanted to ask you about your background. Can you tell us about your career journey so far? (2:39)

Eleni: Yes. You practically already covered most of it. But let me also say some things from my perspective. I'm currently a postdoc at the Technical University of Berlin and I have been in Berlin since October 2019. So, that’s roughly two years now. Prior to that, I spent six years in Lausanne, Switzerland, where I did my PhD in computer science – more specifically in Data Management at EPFL, which is the École polytechnique fédérale de Lausanne. During my PhD, I also spent some time in New York for a few months, where I worked at NYU. Before my PhD, I was in my home country of Greece, where I did my undergraduate studies in Electrical and Computer Engineering at the National Technical University of Athens. To sum it up, I went from an undergrad in Athens, to a PhD in Lausanne, and then a postdoc in Berlin. (2:49)

Eleni: I've been in academia my whole career so far. (2:49)

Spatial data analytics

Eleni: My main research area so far has been spatial big data analytics. This is the broader topic of my PhD and a topic that I'm still working on, even though right now I'm not focusing on that entirely. So, what is it? Well, broadly speaking, it is essentially the process of analyzing spatial data to find threads, gain insights, and answer questions. Then the next question you might have, of course, is “What is spatial data? What is this data that we are analyzing?” Spatial data is any data that contains some geometrical or geographical feature. Spatial data can be points in space that can, for example, correspond to GPS locations. (3:50)

Eleni: Say we have data that is about something like taxi rides. That’s spatial because we have the location and the trajectory of the ride. It can also be lines that represent the road, or a river on a map. This can also be polygons that represent regional boundaries and things like that. I think you get the idea. These are examples that are in two dimensions, but we also have three-dimensional spatial data. That can represent something that has volume in space. There is also another category of spatial data, and the main representative of that category is satellite images. Because with such images, you have a collection of pixels, and each pixel corresponds to some geographic location. So this is also an example of spatial data. (3:50)

Responsibilities of a postdoc

Alexey: What do you usually do at work? What kind of responsibilities do you have now as a postdoctoral researcher? (5:48)

Eleni: I do a few different things, I would say. Although first of all, I participate in research activities, which contain many different things, but broadly it means working to solve some research problems, or even thinking about new ones. It also involves participating in project meetings with other colleagues and writing papers when a project is mature enough and there are results that can be published. I also mentor students. So far, it’s mainly for Bachelor's and Master's theses. But I also have some experience co-mentoring some PhD students. Currently, I also teach a Bachelor course. Sometimes I also have to disseminate my research by giving talks at conferences or other venues. Finally, another thing that I'm doing – it's not strictly tied to my job description – but I'm also contributing to the international community in my field by reviewing papers for conferences or journals. If I look at my week, I would say this is usually what's on my schedule. (5:56)

Alexey: Would you say this is a typical day for somebody working as a postdoc in academia? (7:19)

Eleni: I would say it’s more like a typical week. I try to avoid multitasking as much as possible because I don't find it effective. I try to have some days where I can focus entirely on one activity or another, thus I would say this is what a typical week looks like. Over the course of a month or a few months, the distribution of these tasks can also change. Maybe there are periods where some task completely goes away. For example, as you know, there are semester breaks where you don't have to do any teaching or I don't always have papers to review for conferences. But other activities, like mentoring students, or having some regular meetings with them, are usually permanently on the schedule. (7:29)

Publishing papers

Alexey: You said that when you're writing papers, “if there is some outcome to share”. I'm curious, how often does it happen that you do some research and you don't have an outcome to share? You don't end up writing the paper? Is this something that happens often? (8:29)

Eleni: Yeah, that is a good question. What can happen is that you don't have research results that are good enough to write a really good paper, and so, it can’t be published at one of the top conferences in the field. To have an idea where the results are top level – that’s really hard. What can happen is that initially you’re targeting these top level venues, and in the end, your results make it hard for you to publish there. But in my experience, if you still did some thorough and decent work, even if the results are a bit disappointing and not what you were hoping for, there's still a way to publish your work. There are always some interesting findings in it, but you just might need to go to less competitive things to publish. But for me, if there are some reasonable results, and you can do a good job – which is usually the case – then it still makes sense to publish them somewhere. (8:46)

Best places for data management papers

Alexey: The top venues where you publish are usually VLDB, and what else? (10:03)

Eleni: SIGMOD as well. These two are the really top ones. Yes. (10:13)

Alexey: So, VLDB is “very large databases” right? And SIGMOD is “special interest group…” MOD? (10:15)

Eleni: “Management of data.” (10:32)

Alexey: Right. MOD is management of data. These are the places where researchers who work in databases, data management, big data, analytics, and all these things – these are the conferences where these researchers want to publish first. Right? (10:34)

Eleni: Yeah, these are top value places. There is a third one that I would say is equally good. This is called ICDE, which is the “International Conference of Data Engineering”. I think these three. (10:53)

Differences between postdoc and PhD

Alexey: Interesting. Is being a postdoc different from being a PhD student? You mentioned that you need to do a lot of things like thinking about research problems, writing papers, mentoring students, teaching a course, reviewing papers. But PhD students also do these things, right? (11:11)

Eleni: Yes, you're quite right about that. Overall, I feel that the main difference in the postdoc position is that you have more responsibility. So it's the amount of responsibility that changes and maybe also a little bit on the distribution of time on these different tasks. Going back, as a PhD student, my primary job was to work on my PhD thesis. It was clear that this was the top priority. I was also teaching as an assistant, but that was not more than let's say, 20% of my work time. From time to time, I also mentored some Bachelor’s and Master’s students during my PhD, but not so many and not so frequently – like not every semester, for example. (11:33)

Eleni: Also, in my PhD research, I was mainly conducting research on my own. I had a lot of alone time to work on my thesis. What has changed now, I would say, is that I have more mentoring to do – now I mentor students every semester. This mentoring also comes with more responsibility because I mentor not only Bachelor’s and Master’s students, but also some PhD students as a commenter. It is important to ensure that they will be successful. You don't want your students to just waste their time working on something that is not promising. (11:33)

Eleni: Teaching also comes now with more responsibility. Before, as I said, I was an assistant. I didn't have to think about the tasks that needed to be done – I was just assigned these tasks and needed to carry them out. That was it. But now, I am the one that manages the course, so I need to organize everything and make sure that the course runs smoothly. This responsibility falls on me. Then as I said, in my PhD, I spent quite some time working alone. But now I would say I do less research completely on my own and more in collaboration with others. In my everyday life, this also means that I spend more time in meetings with other people. (11:33)

Eleni: I also mentioned reviewing papers – I have to say that this is something that I didn't do much in my PhD. But that's normal, because you are invited to review papers for conferences when you're a little bit more senior in the field. Usually you don't get a lot of these invitations as a PhD student. (11:33)

Alexey: To summarize, you get a broader scope of work after the PhD. Previously you would just sit and do the research yourself, but now you delegate more of the work to others. So other people do research and you're helping them with that instead of just sitting there by yourself and doing it. Did I summarize it correctly? (14:41)

Eleni: I would say currently I actually kind of work in two modes. Yeah, there is a little bit of a delegating aspect, which you can say that only happens when I work with students. If someone is doing their thesis with me or their PhD, they are still expected to own their work. It is their thesis. I'm going to help them but I'm not supposed to improve the thesis for them. So in that sense, I'm providing guidance. But I also – not on a constant basis, but maybe in the past year – I also do some collaborative research with other researchers that are more at the same level of seniority as me, let's say. So in that sense, it's more of a collaboration. Nobody's delegating anything to someone else, but rather it's more sharing the work. (15:06)

Helping students become successful

Alexey: I understand. Thanks for clarifying. You also mentioned one thing that I think is quite important. You said that now you have more responsibility, and a part of your job is that you need to help others be more successful. You have Bachelor’s students, you have Master's students, you have PhD students that you co-commenter or co-advise. How do you actually make sure that they are more successful? How do you do this? How do you help them select topics? Because a PhD thesis is quite a large piece of work, right? It's a five-year commitment. That's a lot of responsibility on you as well, when you come up with a topic. So how do you help them be more successful? (16:06)

Eleni: Yeah, that's a great question. Okay. Maybe I can also talk a bit about the Master’s and Bachelor’s students. In this case, I have to say that it's easier, of course. Most of my experience is also working with Master’s and Bachelor’s students. In this case, I'm typically the one that proposes a list of topics – I advertise these topics and then the students pick the ones they like and then we potentially tailor it together to their interests. There, it's a little bit easier. The scope is also smaller, because this is research that needs to be completed within, let's say, six months. That also limits a little bit what you can do during the time available. (16:58)

Eleni: In terms of PhD students, so far, I have only acted as a co-supervisor, which means that the responsibilities are shared. But overall, how it works at DIMA – at least what I have seen so far during my time at DIMA – is that we generally have a few bigger research themes in the group. Bigger umbrella topics that are mostly, I guess, determined by the leader of the group, Professor Markl. The students typically need to identify a sub-topic that interests them within these bigger research themes. Already the fact that there's this kind of this bigger umbrella topic, this provides some guidance. (16:58)

Eleni: From there, they come up with ideas by reading, brainstorming, talking to their mentors, but also to other colleagues and fellow students. So the whole group can provide some guidance there. For PhD students, I don't dictate the topic. They mostly have to come up with it on their own within a bigger research team. How I help them is mostly by asking them questions and talking to them to make sure that they really understand the problem that they picked, and also make sure that this is indeed an open research problem. I expect the student, for example, to be able to clearly articulate what the problem is succinctly and in a sentence, ideally, why it is interesting and important. Why it is hard and why it hasn't been solved before. Usually I challenge the student and the student has to convince me that they can answer these questions about their chosen topic. Then we can go ahead with that topic. (16:58)

Alexey: Is this some sort of formal process or does the student just come to you and you then have a discussion over a cup of tea and you ask them tough questions? (19:47)

Eleni: Well, it is a little bit of a more formal process in the sense that we have some regular meetings with some structure. Also, in some cases, it might be that I ask the student to actually answer these questions in written form. Sometimes writing down things can help you structure your thoughts. As I said, I'm the commenter, right? There are also other people that interact with the students and make sure that they're working on something good. (19:58)

Alexey: But from what I understood, it's actually the responsibility of the PhD student to come up with a good topic – you're just helping them to make sure that the topic they pick up is good, and they really understand what they want to solve. But it's more like coaching. (20:36)

Eleni: Yeah, this is how it usually works. In all of academia, I have seen it done this way. This is also more or less how it was for me when I was writing my PhD – I was not assigned to specific topics that I worked on, or specific problems. I was given a broader direction, but then finding the specific research problem within that direction was my own responsibility. I have seen in other groups in academia, especially for first year students, where it might be that the professor actually gives them some more concrete topics, similar to how I said that I give topics to Master’s students. This would be to get them started with the first research problem that they will work on. But yeah, as you said earlier, the PhD is a large piece of work. Usually you're expected to publish two or three papers that will all be part of your PhD. So even if you are assigned a topic at the beginning, it’s still expected that for the next paper you will be more independent in choosing your own topic. (20:52)

Alexey: I'm curious, how does it work? You have a broader direction and then a student comes with more narrow research? Do they just read a lot of papers and then see which ideas they like? Then they just run these ideas by you and you say, “Yeah, these are cool ideas. Let’s dig deeper.” Something like that? (22:05)

Eleni: Yeah, something like that. It could also be that we provide a few more pointers about what is still open in this project topic. Since the topics are broader, it also means that there are other people that are working in this broader area. Thus we also want to make sure that each student has a separate topic from the other students and that they don't overlap too much. We also have some pointers about “Okay, these are not taken yet or this direction.” (22:22)

Research at the DIMA group

Alexey: By the way, you said there are a few bigger research themes in the group? What are these themes at the DIMA group? I know IoT is one of these topics. What are the others? (22:54)

Eleni: Now there are practically two bigger topics now. Yes, one is IoT, or building a general purpose data management system for the IoT. This is a system that we call a ‘nebula stream’. Then there is another topic that we call ‘agora’. This is a unified data infrastructure for building ecosystems. In this ecosystem, we want to bring together different data assets that can be the data itself, but also algorithms, models, computational resources, so that users can combine all these resources and use them to develop their applications. (23:08)

Alexey: Is the DIMA group still involved with Apache Flink? I remember when I was studying there, it was a very big thing and a lot of research was focused on Flink. But now, since Flink is a more independent entity is it still the case? (23:59)

Eleni: Not really. Basically, in a sense, this ‘nebula stream’ system is kind of the next Flick, you could say. This is representative of how the DIMA group works. Before there was Flink and a lot of researchers were working on different problems related to that system. Now we have ‘nebula stream’ and ‘agora’ and a lot of the research goes into building these systems. (24:15)

Alexey: Cool. So you're working on the next thing? That's awesome. So the working name now is ‘nebula’, right? (24:45)

Eleni: Yeah. (24:53)

Identifying important research directions

Alexey: Interesting. In general, I'm curious – How do you know what the important topics in research are? There are some topics, or themes, in the group in general and you try to stay within these themes. But this is still generally quite broad, right? How do you know what is an important topic? What are the trends in these topics? How do you stay on top of that? (24:55)

Eleni: Mm-hmm. Well, it's hard to answer this question and something that is kind of hard to do, I guess. It also depends a little bit on how much forward we want to look into the future. I think, generally, you can tell what the current important topics and trends are, mainly by looking at what is being currently published in the major conferences in your field. It can help even more if you attend these conferences to exchange ideas with your peers. This helps you identify the trendy topics in the present time. (25:28)

Eleni: As I said, I have also been reviewing papers for conferences. In general, this means that I have access to papers before they are actually published – while they're still works in progress. To some extent, this can help foresee trends that might come up in the near future, let's say within 1-3 years, maybe. You still get access to some work that's not published and maybe won't be published in that cycle. But you see what people are working on, so this helps you form an idea. (25:28)

Eleni: But what is really hard, of course, is to predict the next big trend that will come in maybe 10 years or so. Unfortunately, I cannot claim that I’m in a position to do that. But if you can really do this, it is very valuable. These trends in research sort of come in waves, and if you are at the beginning of such a wave, and you are one of the first people to work on a certain topic, before it is broadly popular, when that topic becomes popular later – if your prediction was correct and this becomes an important topic – you will be a pioneer in the field. Your work will really get a lot of attention and a lot of people will refer back to it and cite it, which is something that matters. (25:28)

Eleni: Something else that I can mention here is that, in the database research community, there is a tradition that every few years – I think every five years or so – a group of more senior database researchers gather together in the same location (they meet physically) and they brainstorm on what the next big trends are and how we as a community can get more impact out of our research. After the gathering, they write down their findings in a report. In general, I find that this can be quite helpful for more junior researchers to help them identify future research directions. I mentioned before that going to conferences can also help and one of the reasons that it can help is that there are also discussion panels, where again, you see more senior people discuss the future of the field. (25:28)

Alexey: So, basically networking, right? Conferences and seeing what is published. Do you think it's important to also be in touch with industry partners to see what is going on in industry and to see what will possibly become important there? Or are the conferences enough? (28:30)

Eleni: That is a great point. I have to say that generally, in our conferences, there are also people from industry. We are a field that is quite open to industry. Yeah – definitely. I think this is also something that can help. (28:57)

Alexey: A funny thing that you mentioned about the fact that picking things that will be important in the future is difficult. Well, the DIMA group, I think, managed to do just that at least once – with Flink. I remember when I was choosing whether to go to Berlin or to some other city, I saw a pitch from Professor Markl where he said that at DIMA, they do not teach things that are important now, but they teach things that will be important in three to five years. He would then go into Flink – back then it was called ‘stratosphere’, I think – and talk about how cool this thing is. He was actually right. Because back then, I don't think it was really appreciated in the industry – this Flink thing. But now, people like it. (29:20)

Eleni: Yeah, definitely. I would say that in the sense, yeah, Professor Markl is definitely a visionary person. In that sense, again, for me as a more junior researcher, it also helps to have him as a mentor. (30:13)

Reviewing papers

Alexey: But the main way of seeing trends in research would be to attend conferences, doing these gatherings and even discussion panels. I think you also mentioned reviewing papers. How typical is it for people to also work on reviewing papers? Do all postdocs do this? Or not everyone? (30:27)

Eleni: I would say most postdocs. To get to review papers, you have to be invited. But it is also a volunteering activity – in general, you're not getting paid or anything to do that. But I would say most postdocs are known enough to be invited to do such things. (30:54)

Alexey: You have to have made a name for yourself, right? (31:28)

Eleni: Yeah. Well, someone has to invite you. So they need to know who you are. (31:30)

Alexey: They need to publish a few papers and then they say, “Okay, this person clearly knows this particular area of research. Let’s invite them and ask them to review papers.” Does it work like that? (31:34)

Eleni: Yeah. I would say that you need to have published some papers on your own, because it's a little bit weird to judge other people's research if you haven’t proved that you're actually able to conduct good research yourself. But it might be the case that someone has published a lot of papers, but they're still not invited for these conferences, just because they don't know a key person in the committee that can actually invite them to be on the committee. In that sense, you have to do some networking. In my case, I guess I was lucky that I have been in research groups that have visibility. So both the DIMA group and also my group at EPFL, which is called THEOS. Our research groups have international visibility and are among the best in the field. So that also helped me to get visibility. (31:48)

Underrated topics in data management

Alexey: Yeah, thanks. Speaking of trends in research – sometimes you come to a conference and there is one topic that everyone is talking about – it's clearly a trend. But the next year, you come and it seems as though everyone forgot about this topic. I'm sure you can think of such things, that come in waves and then and a few years after that, nobody remembers about them. I would call this ‘hype’. (32:49)

Alexey: But there are things that don't get the attention they deserve. Sometimes, there are things that people get more excited over, but there are other things that maybe people don’t notice. Do you think there are some research topics in your area of research that are underrated right now? (32:49)

Eleni: Yeah, obviously that's a difficult question, I think. First, I'll try to give a bit of a more general answer. One of the things that I have personally noticed in the field – and to some extent, it bothers me – is that in most cases, people only focus on improving performance. This typically means improving runtimes, or measuring performance by runtime. So when you come up with a new technique or a new solution, in order to convince the reviewers at the conferences to accept your paper for publication, you usually have to show “My new technique is X times faster than all previous techniques.” Then it's easy. (33:38)

Eleni: Of course, it is important to improve (decrease) the runtime, but there are also a lot of metrics that can be important. Something like, the ease of adoption of a certain technique, or the usability, the programmability, or even things like energy consumption. And these metrics, in my experience, are typically a bit overlooked. To some extent, I think this happens because they are a little bit harder to quantify and put a number on. That's a bit of a general answer, but as a result, if your work focuses on improving a metric that's other than performance, it is usually a bit undervalued by the community in my opinion. It becomes harder for you to publish your work. (33:38)

Eleni: Then there is another broader topic that I think – well I do not think it is undervalued, but maybe you could say that it's underrated – it's a topic where everyone understands its importance, but it's still not so popular in the sense that not so many people want to work on It. This topic is data cleaning. I think everybody would agree this is a really important open problem. But I don't think there are enough people working on that right now. I think this is probably due to a combination of factors, because it is a really hard problem. So to begin with, it is a bit scary, I think. At the same time, again, maybe there is this issue that, in the case of data cleaning, it is harder to quantify the impact of new approaches and convince the community about their value. Again, you cannot just go and say, “Oh, I made things work faster.” It’s a bit more complex than that. (33:38)

Research in data cleaning

Alexey: I'm wondering – data cleaning is more like an art than a science, how do you actually quantify that? There are no metrics that measure the “cleanness” of data, so this is a tricky one. Is there research in this area? (36:04)

Eleni: Yeah, there is research, of course. I don't have experience with it – I haven't done research in this area. I have to say that, early in my PhD, my professor mentioned this as an option back then. I have to say that I was also a little bit negative or scared of the topic. Because to some extent, it still might be that you recognize that the problem is important, but you also need to like it. And if for some reason, you don't like it, you cannot force yourself to work on it, even if you believe that it's important. But there are people that do –it is an existing research area, definitely. I just think that overall, it's still not popular enough. It's not as sexy or fancy as other topics that are probably less important. (36:21)

Alexey: Right now, at least in the industry, this is a very manual process. You need to do a lot of trial and error. Then you need to handle all the corner cases. The code you have to clean gets bigger and bigger with every corner case. I guess the area of research would be how to actually automate this, instead of relying on this infinite loop of trial and error. Instead you would actually have a way to automatically figure out what the data cleaning problems are. It would be handy. (37:14)

Eleni: Yeah, the main effort would be either to fully automate – or maybe that's not so realistic – but at least to make it easier for the data scientists. (37:45)

Alexey: I can relate to that problem. I can imagine having a black box and saying, “Hey. Here's a pile of dirty data, please make it nice.” Yeah, I would use that. (37:59)

Eleni: Definitely. Can you imagine? (38:10)

Collaborating with others

Alexey: Yeah, thanks. You also mentioned that now, as a postdoc, you do a lot of work in collaboration with other postdocs. Since they are on the same level, you're not necessarily mentoring them, but you're more collaborating with them. Do you collaborate, or work, with a lot of people from different industries? Maybe, let's say, from different groups that are not necessarily data management groups? (38:13)

Eleni: Yeah. I have some experience working with people that are not in Data Management. In general, there are connections – it's easy to find connections – from data management to other disciplines. For example, in my PhD, I was involved in a large, multidisciplinary project that is called the Human Brain Project. One of the main goals of that project was to build three-dimensional models of the human brain and then run simulations over them. For that project, I had collaboration with neuroscientists, who were the ones building the brain models. (38:40)

Eleni: Within that collaboration, I developed some tools that allow them to access sub-regions of those models more efficiently, which was the data management component. In my experience, having these kinds of collaborations does require some extra effort, because you need to find a common language – you need to find a way to communicate with someone who is outside of your field. That's not always easy. It goes both ways, some things that were obvious to me, I had to understand that it's not obvious to someone outside the field, and I have to find a way to explain to them. But I also had to understand what they were telling me. But it can also be quite rewarding because you see your work being applied to solve a real problem. (38:40)

Alexey: I remember in DIMA, it was a seven-floor building and we were on the seventh floor. Sometimes I would go on the sixth floor and there was a group that was doing some video encoding (I don't remember exactly), but it was like a completely different thing. I don't remember what was on the fifth floor, but it was also something unrelated. What I found interesting is that these groups don't really communicate with each other. Is this still the case or are you trying to somehow find ways of connecting with each other? (40:34)

Eleni: Yeah, that's a good point. Maybe now it has improved, I would say. Actually, there is one issue that I find to be a problem concerning where the groups are, which is the building itself. I think it's actually not so great that the DIMA group is on the seventh floor by itself. Everybody on the floor is from the same group and the other groups are on different floors. Then you have a common lounge area. But when you are there, you will only see and interact with people from your own group. There's no chance that you'll meet someone outside of your group, where you can initiate some maybe interesting discussion. So, I think that this is not great. (41:10)

Eleni: If we had some lounge areas where people from different groups can hang out, this would really help a lot to bootstrap some collaborations. I have to say, in my previous university, there was a bit more mingling between the different groups. However, you mentioned the six floors – I'm not sure which group you're referring to. But, for example, right now I am involved in a collaboration with another group in TU Berlin, which is the remote sensing group. In this group, they are working with satellite imagery and how to efficiently index these images, but based on actual content, which requires some deep learning techniques that they have developed. We have a collaboration with them to help them tackle the problem from the data management aspect as well. (41:10)

Alexey: Sorry if it came out as a criticism. I actually wanted to lead this into asking how you collaborate with other groups and how you find ways of collaboration. What you brought up, the remote sensing group –it's quite interesting. Do you know how it happened? How did you meet and decide to work together? Because I guess this is also related to your area of research – spatial analytics, right? (43:05)

Eleni: Exactly. Yes it does. However, I am involved in these collaborations but I have also been in collaborations before. I was not the one that bootstrapped them or initiated them. This was done, usually, by the professors leading the groups. So I would say this was probably the case here. I was brought in touch with a professor that works in remote sensing through my professor that I'm working with. So he made the connection. (43:33)

Choosing the field for Master’s students

Alexey: Yeah, thanks. I noticed that I've been ignoring questions, so apologies for that. We have a few questions. Amin is asking, “For computer science Master's students, what fields could be good to work in order to be able to apply for a PhD position at a top European or American University?” (44:17)

Eleni: Okay. That's a good question. But, of course, I would say it really depends on your interests. Since I’m in Data Management, I would advertise and support my field. I would say to do research in data management. But at the end of the day, you should pick the topics that you like. In the end, what matters is the quality of your research rather than the topic. That said, of course, I cannot ignore that right now there is a big focus in academia on AI and machine learning. In that sense, there are probably more opportunities and open positions in these domains than in others. But there is also more competition, so this is also something to take into account. That's why again, it goes back to really making sure that it is something that you like and want to do because you like it, and not because it's popular right now. There is a lot of competition, so if you don't like it enough, there will be someone else that does and they will be better than you. (44:45)

Alexey: I guess for any field you take, you will be able to find a university or a group at some university that does this type of work, and you'll be able to go there. But I'm curious, how do you go about selecting topics there? So let's say you pick a broader field, and then you want to find some topics. I think it goes back to the discussion we had of identifying trends. I guess this is what you can also do, right? You can take a look at the conferences and bigger venues in the field, and then see what the trends are there, right? Then you can pick some of these trends and try to see if you like them. (46:18)

Eleni: And maybe two more things. One thing is to maybe try to do research already in your Master’s. Because this can help you identify what you like. This can mean that you may work on your Master’s in the field, and you say that you really like it and you continue in your PhD. But it can also mean that you try something in your Master’s and it's a good way to find out that you actually don't like it. Then you move on to a different area. This was a little bit the case for me, for example. My Master's thesis topic was related to signal processing, it was not in databases. (47:06)

Eleni: Another thing that I also did is an internship, prior to my PhD in the same group, where I ended up doing my PhD. But I first worked there for a few months as an intern, again, to kind of make sure that this is something that I like, before I committed to something bigger, like a PhD. (47:06)

Eleni: In addition to that, in my previous university, EPFL in Switzerland – and there might be other universities that apply to the same thing, I'm not sure about that – but there was a chance that if you're selected with a fellowship, you can actually spend your first year at the University working in different groups until you decide which group where you actually want to stay and do your PhD. That also gave some students the flexibility to try different things and identify what they like. (47:06)

Choosing the topic for a master thesis

Alexey: Maybe this is something you can also help answer. This is a question that comes up in the community quite often, maybe at least once a month, especially during the time span that students need to select a topic or a Master’s thesis. “How do I actually select a topic?” Do you have any advice for those students? How can they pick a topic for their Master's studies – for the thesis? (48:46)

Eleni: Yes, as I said, at least in my group, everybody advertises their topics, right? So you don't have to come up with one. Like, my students don't really have to think from scratch about the topic, but they can go through a list of advertised topics that each of them comes with the description as well. I would say again, it goes back to what you find interesting. But one other aspect is to maybe also make sure that you have enough background for that topic. Or if you don't, that you are aware that you don’t, so that you're really prepared to cover the missing background. Otherwise, it can be a negative experience to work on something that you don't have the skill set for. Then, to some extent, sometimes who you're working with can also be as important as the topic. In some cases, even more. So I would also say to be careful in choosing your mentor. Make sure that you have a good connection to that mentor, and that you’re aligned in terms of how you work together. (49:17)

Alexey: In practice, it basically means finding a PhD student whose research you like and then trying to see how you can help them in their PhD research, right? At least this is how it was for me and this is how it works. I remember we had a sort of meeting, and then PhD students would come and pitch the topics that they're working on. Then whoever likes the topic would come and directly approach the PhD students saying “Hey, I like what you're working on. Let's do something.” (50:43)

Eleni: Yeah. That's true. PhD students are also involved in teaching, so it's actually an opportunity for Master’s and Bachelor’s students to interact with them and see what they're working on and have some discussions about potential topics. In that sense, you can also cover what I said – have a mentor that you're happy with, because you actually can work with a person that you are already familiar with, because you had that person as a teacher in some course. To me, it happens a lot that the students end up doing a thesis with me, and they are students that already from some course where I was involved – they're not complete strangers. (51:20)

Should I do a PhD?

Alexey: What advice would you give to somebody who's just graduated from Master’s and they are not sure whether they want to continue researching and do a PhD or they just want to work in the industry? I think this is quite a difficult problem. I remember, for me, it was a difficult decision. Do you have any suggestions or advice for people who need to decide? (52:07)

Eleni: Yeah, to some extent, it’s a bit of what I already said, which is to give it a try, but on a smaller scale. So before you start a PhD, try to do some research either in your Master’s or as part of an internship. One important thing that comes with that is that you can have more fun actually doing research when not in your PhD because there's less pressure – there is no pressure to publish since doing research in your Master’s is completely optional. There is no clock ticking, telling you that in three years you need to publish your papers. So you can actually have more fun, I think. I am someone that, unfortunately, doesn't have industry experience, but I would say the same way – you can try to do some internships in industry as well. Work for some companies, try it on, and then hopefully, it will become clear what you like more. (52:38)

Alexey: So doing a thesis and then also doing an internship and seeing what you like. That's actually a good point about not having pressure to publish. Because I remember when I was writing my thesis, we decided to publish it as a paper after I graduated. So nobody forced me, like “Hey, write this paper or you will not graduate. You will not get your Master’s.” But with a PhD, if you don't write your paper, then you do not graduate, right? (53:44)

Eleni: Yeah – with no publications, you cannot graduate. (54:22)

Alexey: Because with a Master's thesis, you just defend it and your supervisor says “Okay, this work gets this mark.” And then the people, the jury, who watch the defense, say, “Okay, this gets this mark.” And then you get your papers. With a PhD, you still have to do that, but on top of that, you need to have quite a bunch of papers published at top venues, right? That’s tough. (54:27)

Eleni: Yeah, different groups have different requirements in terms of how many papers you are expected to publish? But yeah, in general, it's always at least one. Broadly speaking, at DIMA, it’s typically three and they have to be at top conferences. So it is not easy. (54:59)

Promoting computer science to female students

Alexey: One more thing I wanted to ask you. This is actually something we also talked about with a guest a few podcasts ago – with Barbara, who takes part in the Women in Data Science community. There, the question that somebody from the audience asked her was “How can we promote computer science to female students?” Because this is not very popular among female students – and in general, data science, computer science – I would say these are male-dominated areas. What do you think we can do to attract more female students to do research in computer science? (55:19)

Eleni: Yeah. This is a topic that I like to talk about and I personally also find it very important. There are a lot of discussions on this topic, so I would like to try and to give my personal experience and perspective, and share with you what attracted me personally to, first, study computer science, and then to pursue a career in research. Then I could try to see how this could apply more generally for other women as well. (56:02)

Eleni: So to begin with, in my case, my choice was influenced by my father, because he's actually a professor of computer science in Greece. In my case, I was a little bit lucky, let's say, that I had this influence in my home. But taking this to the general case, I think it is important that we organize events for younger students in high school, or even primary school – in particular female ones – but these events could be addressed to involve students, where students can see what computer science is, but in a very gender-neutral way. (56:02)

Eleni: Yeah, I really want to put emphasis on this gender neutrality, because I think it is really crucial that tech students stop making this association in their minds that computer science is only meant for geeky guys that sit in their basements wearing glasses and dirty T-shirts. Now, I'm over-exaggerating the stereotype from the other side, but this is how CS is often depicted in pop culture, right? In the movies, yes. Inevitably, this actually influences people. But yeah, I think in these young ages, it is really important for students to realize that your gender and your fashion choices are completely irrelevant, so just focus on the essence. (56:02)

Eleni: Again, going back to my personal experience, through my father, I got attracted to the field, but later on, when I was doing my PhD, it was quite helpful for me that my PhD advisor was actually a woman. So in that sense, I had a female role model. Actually, what helped me even more was the fact that a lot of my PhD peers in my group were also female. To me, it was actually more helpful because I could more easily relate to other women around my age, who were just one or two steps ahead of me, and I could feel that I'm not alone. That helped me to realize that if they are successful, then I can also be successful. So again, extrapolating to what this means in general, I think it is important for universities to employ females at all levels – as professors, but also to make sure that they are also female postdocs and female PhD students. (56:02)

Eleni: Maybe, finally, the hardest part is to actually keep women in research and more specifically, in academic research, in the long run. If you look, there are all these different studies that show that in the current situation, at every stage, women are more likely to drop out than men. So to begin with, we don't have so many Master’s students, and they get even fewer at the PhD level, even fewer at the postdoc level and professor level. I think for me, this is maybe the hardest problem and it happens because it's quite hard to combine creating a family with the academic expectations that require you to be mobile and change cities or countries until you get the permanent position and you also have to work long hours. So this is one of the hardest problems, I think, to solve. (56:02)

Eleni: Of course, some solutions go towards the direction of having better childcare support for women. But yeah, my feeling is that it somehow goes beyond that – that somehow universities need to acknowledge and be aware of the additional challenges that women may face in their personal lives and make some adjustments for that. (56:02)

Alexey: We should start with schools, right? Like you said – have these events. I guess it wouldn’t be helpful if somebody who comes and presents at this event is a guy in this t-shirt, like you described. So maybe it will be helpful to have a female presenter as well. Right? (1:00:26)

Eleni: I can really tell you from my personal experience that this problem is real. When I was younger and I was starting with my PhD, I somehow really felt that the way I dress and the fact that I wear makeup and all that, that it’s not compatible with me being in computer science. I thought maybe that it was a sign that I'm actually not good for that, which is crazy. But going back, I can really see that I was, to some extent, having these thoughts and had the feeling that I don't fit into the stereotype and it’s probably because I'm doing something wrong. Not because the stereotype is wrong, but because there's something wrong with me. So, it can have an impact. (1:00:51)

Finding Eleni online

Alexey: So, we should be wrapping up. If people have questions for you, how can they find you? (1:01:34)

Eleni: I guess you. I have already shared my information on your website, right? (1:01:41)

Alexey: There is a link to your DIMA page with all your contacts. (1:01:50)

Eleni: This information is up to date. There is my email on my website – I guess that's the easiest way. (1:01:54)

Wrapping up

Alexey: Okay, thanks a lot. Thanks for joining us today. Thanks for sharing your story. Thanks for telling us about the work you're doing and thanks for talking about challenges openly, and all these things. So thanks a lot. Thanks, everyone for joining us today, for asking questions. Yeah, thanks, Eleni. (1:02:02)

Alexey: 62:23 Eleni (1:02:02)

Alexey: Thanks a lot Alexey. It was great being in the podcast. Great to meet you. (1:02:02)

Alexey: And sorry for technical difficulties. I hope it was fine that I was on my mobile phone. (1:02:30)

Eleni: No, no, no. After the first issue, it was perfect. Yeah. (1:02:36)

Alexey: Maybe I should just do this from my mobile phone all the time. Okay. Thanks a lot and have a great weekend! (1:02:41)

Eleni: Yeah, you too. Thanks. Goodbye.W (1:02:50)

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.