Data Engineering Zoomcamp: Free Data Engineering course. Register here!

DataTalks.Club

Season 23, Episode 6

Data Engineer Career in 2026: Roles, Specializations, and What Companies Look for | Slawomir Tulski

Show Notes

Timestamps

Click any timestamp to jump to that moment in the video

Transcript

The transcripts are edited for clarity, sometimes with AI. If you notice any incorrect information, let us know.

From Measuring Glaciers to London’s Tech Scene

Alexey: Hi everyone. This week we are going to talk with Slawomir about the reality of data engineering in 2026. Slawomir spent most of his career at Meta. He has moved between individual contributor and management positions, focusing on scaling data engineering support at the Meta ads ranking system. (0.0)

Alexey: He has also contributed to global hiring by conducting hundreds of interviews and helping shape the company's data engineering recruitment process. Now he works as an independent consultant helping businesses make most of their data engineering teams. On the personal side, Slawomir is a dad, a husband, and he talks about being a failed bass player. (0.0)

Alexey: I am really curious about the last one because I am a failed drum player. The reason is that I always wanted to have a metal band. My friends and I were deciding who plays what. My friend said he was playing the guitar, and another friend said he was playing bass. (33.0)

Alexey: I figured that meant I was playing drums. So then I came to my parents and said I was starting a band and needed drums. They said there was no way. That was as far as I went in the journey. (33.0)

Slawomir: I actually have a bass. I am just really bad at playing it. It is so much harder than data engineering. Data engineering is simple compared to bass playing. (1:10)

Alexey: That is why I failed. I had a band in the past, but it was a failed one as well. What genre was it? (1:20)

Slawomir: It was back in the school days. So we did not even talk about genres. There was a guy playing guitar and there was a lady singing. So that was more like a pop, but I just wanted to play anything. (1:28)

Alexey: Was it like Behemoth or anything? (1:38)

Slawomir: No, unfortunately not. I would love to do that. They are probably the most famous Polish band. (1:46)

Slawomir: Of course. Everyone in Poland knows them. They are quite an interesting band for that country, especially the background. (1:54)

Alexey: We are here not to talk about your bass player career, although that could be interesting, but more about data engineering. First of all I want to welcome you to our interview. It is really nice to have you here. (2:05)

Slawomir: My pleasure. Thank you very much for inviting me. (2:22)

Alexey: Usually, always in all our interviews, I start by asking about your career so far. So can you tell us about your career journey? (2:25)

Slawomir: I will try to be brief here. There is a lot written already on the introduction. So pretty much I started in academia. So I was a researcher and I almost started my PhD before quickly realizing that I need money and I won't make money that way. (2:34)

Slawomir: From Measuring Glaciers to London’s Tech Scene. So I dropped my assistant professor position and I dropped my research. That was remote sensing and geosciences. It is pretty much satellites, getting the satellite data and doing things with that. (2:56)

Alexey: It is like internet of things kind of stuff? (3:05)

Slawomir: No, it is more like measuring natural phenomena with the satellites. So I was supposed to measure the movement of glaciers for example, or I was doing the forests measurement from the satellites imagery. Those kind of topics are really close to the life sciences. It was pretty interesting. (3:08)

Slawomir: It was amazing. We were actually doing machine learning before that was the thing. But I dropped that because I need money and you probably know that university is not the most generous when it comes to money. So I dropped it and became a software engineer. (3:29)

Slawomir: So I moved to London. I joined a bunch of startups and I was a software engineer. I gravitated toward data engineering because it is software engineering plus data. So that was a natural match. (3:46)

Slawomir: I became a data engineer and after startups I joined Meta. At Meta I was a data engineer. They were Facebook at the time I joined, so I was one of the leading data engineers there, a founding data engineer. Then I moved, as they moved to Meta, I also moved to management and leadership. (3:54)

Slawomir: Recently, as you mentioned, I am an independent consultant. That was a personal decision to leave the UK. As I was thinking how to do it properly, I also decided that I will try to be independent together with that move. (4:10)

Alexey: Poland is a very good country. I lived there. It is amazing. (4:29)

Slawomir: I was choosing between Spain and Poland. So I was traveling across Spain and then I was trying to choose between these. (4:32)

Alexey: So how did Poland win? (4:40)

Slawomir: There is a lot of family background here. You might know this, but honestly, it is like choosing between the rational mind telling you Poland and the heart and the weather telling you Spain. Nothing can beat Spanish weather and the chill. But there are some other things than the weather I have to take into account. (4:44)

Alexey: Polish economy has really been on the growth recently, like the recent five or ten years. It is really nice to see that. It is also good to have smaller taxes. I got used to the British taxes where I was paying an effective rate at some point of forty six percent. (5:06)

Alexey: Almost half of my salary goes to that and it hurts. Since I am in Germany, I know that hurts. (5:20)

Slawomir: The thing is better with that in Poland. (5:26)

Alexey: So when you started working, was data engineering a thing back then, or was everyone software engineers? (5:31)

Slawomir: Those were early days of data engineering. So the term data engineering was coined shortly before I switched to software engineering. I started as a software engineer, but data engineering was already a little bit of a thing. (5:40)

Alexey: What was the state of data engineering back then? Everyone was talking about Hadoop I guess? (5:56)

Slawomir: That was the time where Hadoop was becoming the thing. Everyone was saying that big data will change the world. So we had no AI. Big data was supposed to change the world and we had just data warehouses and Hadoop. (6:01)

Slawomir: That was the time where there were so many consultants doing Hadoop and every single company, regardless of the size, would go to Hadoop just because everyone needs Hadoop. Right now everyone needs AI. Back in the days it was all Hadoop. Big data was the AI of these days. (6:01)

Alexey: If you have ten gigabytes of data, you do a Hadoop cluster. I remember the pain of setting up a Hadoop cluster. It is not something I miss. There is a reason why we do not see Hadoop anymore. (6:30)

Hadoop vs. AI: Lessons from the Original Big Data Hype

Alexey: You worked at startups and then at Meta. I assume in Meta, in Facebook, you have a lot of internal stuff? You do not need to talk about Meta only, just things you can disclose. (6:47)

Alexey: But I imagine that Meta is known, as all the big tech companies are known, for reinventing the wheel? (7:03)

Slawomir: Every single technology either has an open source equivalent at Meta, like Presto, or it has an internal counterpart. Data Swarm would be the example. Nowadays industry is running dbt for transformation. So if you are building pipelines and you have transformation, most of the folks will use dbt for that. (7:11)

Slawomir: At Meta you have a framework called Data Swarm. It was public information as Data Swarm was described by some articles, so they are open about that. So we used Data Swarm there. But honestly, it is not a problem. (7:34)

Slawomir: Those tools are, at the end of the day, just a different syntax or different formula, but mostly it is the same thing. (7:54)

Alexey: Since you have been working or observing this role, because later in your career you switched to management, but I believe you were still exposed to data engineering? What has changed? Do we still, as the industry, agree on the definition of a data engineer or is it still different? (8:01)

Slawomir: No we do not and that is the big problem. I think the biggest game changer for me was actually leaving Meta and, before that, starting a social media presence. That was the moment when I realized how different my data engineering perspective was than others. People cannot agree on this. (8:20)

Slawomir: The role is more than a decade old but people cannot really agree beyond the basics. Everyone will agree that a data engineer probably builds some data platforms, integrates the data, brings the data into the platform, transforms the data, and exposes it to the user. Ingestion, extraction, transformation, loading, and building pipelines; we are going to agree here. (8:28)

Slawomir: But there is going to be such a wide variety of actual tasks between the companies and different environments. It is a little bit for me like saying I am a software engineer. Software what? What do you do? Do you build websites? (9:01)

Slawomir: No one builds websites nowadays, but are you front end, are you back end, are you this, or are you that? If you tell me you are a software engineer, you probably build code and build things with code. That is the same for data engineers. If you are a data engineer, you probably do something with data, but that does not tell you much. (9:17)

Slawomir: I think that is a big problem. I had constant discussions around whether data engineers are doing this or not doing this. Do they build dashboards? No, they do not build dashboards. Do they build data models? (9:42)

Slawomir: No, they build pipelines. No, they are not only the pipelines, they are building business capabilities. Oh no, that is analytics engineers. So there is a mess. It is interesting that despite the role being over a decade old, there is still a lot of vagueness and useless discussion. (9:57)

Alexey: When somebody says I am a front end engineer, it is clear that they are probably working with React or Vue. But with a data engineer, you say it is still not clear? It could be maybe I am just using Fivetran and Fivetran is taking care of everything? Or maybe I am writing everything from scratch and using Kafka? (10:16)

Slawomir: Exactly. That is the point. (10:41)

Alexey: Have you observed any clusters or types of data engineers? (10:47)

Slawomir: Yes, my mental model is that there are two of these. Of course there are generalists doing anything like a full stack. But I think the biggest two clusters would be what I call platform data engineer and product data engineer. Think about this again going back to the software engineering analogy of front end and back end engineer. (10:47)

Slawomir: Platform folks would have a strongly technical software engineering skill set which builds the platform. They build data warehouses, they build the platform, they care about the infrastructure, they are good at DevOps, infrastructure, and system design. They pretty much build the entire ecosystem. That would be one bucket, things around that part. (11:12)

Slawomir: The other bucket would be more on the product side. You already have the data there in the platform, but business needs to do something with that. It is no surprise that, by itself, data is nothing. If it just sits there in the cloud, that is just a storage cost. (11:37)

The Data Identity Crisis: Platform vs. Product Engineering

Slawomir: You need to do something with data and all the kind of tasks and responsibilities of doing something with that data. That would be the second bucket of data engineering which is more working closely with data scientists, analysts, and product owners to build actual analytical capabilities. So let us call it building business capabilities with the data. (11:54)

Slawomir: That is different than building the platform itself. Those would be the two biggest buckets or specializations within data engineering. (12:11)

Alexey: When it comes to the team setup or organizational setup, would you say that platform engineers typically work as one single team and then product data engineers are embedded in product teams? (12:23)

Slawomir: It depends from the organization structure. Usually, from my experience, when you have lean startups, you would probably have all hands on deck where data engineers do full stack. They are more like generalists. When you think about the more mature companies, they tend to split those. (12:34)

Slawomir: You have a platform team which will have those platform data engineers and then you will have all the rest. Internally they will probably even have the same titles. So all of them will be data engineers and their day to day is vastly different. Their skill set is different to sometimes a crazy degree. (12:59)

Alexey: I am just taking a note for myself because I want to ask you about this. Their skill set is different and their day to day is different. Can you maybe describe the product data engineer first? What kind of skills do they need and what do they do day to day? (13:24)

Slawomir: What they do day to day, I will talk about a theoretical perfect scenario because what they do sometimes can be a sad reality. But in theory, the skill set you need is definitely SQL because you are going to interact a lot with the data. I guess that is for everyone. If you are an engineer you have that. (13:41)

Slawomir: So that is kind of generic. You also need some analytical skills and some business sense, kind of BI skills. You should be able to do basic dimensional modeling, basic analysis, and a dashboard. You can do data transformation with dbt and orchestrate your jobs with Airflow. (14:04)

Slawomir: So you take care of the transformation, building models, and building KPIs. You probably do dashboarding and you talk to the business and try to help them to uncover certain things. That would be more on the product side. (14:29)

Alexey: How is it different from analytics engineering? I do not see a difference. (14:41)

Slawomir: That is a part of the confusion. What happened in the industry is that analytics engineers appeared about three or five years ago. They entered because they had the gap to describe what they were doing since they were failing under data engineering. They felt like they were not platform folks. (14:45)

Slawomir: Because they would fall under data engineering which was more about building the platform, they started doing this analytics engineering thing. But there are many companies that did not adopt analytics engineering. They still frame this as a data engineer. I actually have those discussions about how data engineers are different from analytics engineers. (15:19)

Slawomir: Honestly, it is just an attempt to break down this umbrella term which is data engineer. I think in the long run the industry will adopt the idea that the data engineers will become solely the platform side. Then this product side will be taken away by the analytics engineers. It sticks with us but right now we are in the period where it is definitely not separated yet. (15:43)

Slawomir: It really depends on the company. Analytics engineering is still rather niche. They are getting traction but there are still many times data engineers are doing analytics engineering. I do not see much of a difference if you ask me. (16:07)

Slawomir: We either make clear data engineering specializations or we break data engineers into analytics engineers versus platform data engineering. We need to go either way because otherwise it is confusing. It is like in Poland; if you live on the west and if you live on the east, you use a different word for potatoes. But at the end, it is potatoes. (16:23)

Alexey: In one case it is because there is strong influence from Germany, so the word is German. On the other side, it is a Polish word, but at the end it is just a potato. So you cook it and you eat it in the same way. (16:42)

Slawomir: That is a big problem. You have mentioned analytics engineers here, but they are not the only ones in the mix. You have BI developers and there are still some leftovers of ETL developers. There are data architects. (16:59)

Slawomir: How are they different than analytics engineers? There are some companies where there is a big difference between tech native companies and something I call tech by necessity companies. (17:13)

Tech-Native vs. Tech-by-Necessity Company Cultures

Slawomir: Those are the companies who have the tech departments just because they operate at a global level and they just need technical departments. But there is a difference between these and sometimes they frame things differently. I would say that outside tech they are more old school and they will have more of those classic roles. (17:29)

Slawomir: You find ETL developers there. (17:52)

Alexey: I have worked with the clients which still have the enterprise architects, data architects, ETL developers, and business analysts using Informatica and things like that. They were on SAP. (17:59)

Slawomir: It is not far away from that. They were heavily on SAP and all the integrations were SQL Server and a bunch of other things. (18:08)

Alexey: Would you recommend somebody join those companies? We have a data engineering course and from what you describe we lean more towards tech native stacks rather than tech by necessity. But if some student takes our course and later finds a job at a bank or insurance company, would you recommend them joining this company as their first job? (18:21)

Slawomir: Let us forget about the market conditions right now. For some people getting a job nowadays is a big thing. As long as you are happy with what you do and you have the career growth path and space to breathe, join the company and stay with them. But if you join an old classic corporation or a hardcore bank, I would be really careful about what your role responsibilities are and how long you should stay there. (18:52)

Slawomir: There is a high chance that you will become a cog in the machine where you do very narrow things for a long time. Sometimes with some of my clients I remember talking to data engineers who just do dbt models for two years. Throughout their career, the only thing they would do is sit and wait for the data analysts. (19:26)

Alexey: Well imagine if they had to use something like SSIS integration service from Microsoft? (19:57)

Slawomir: In some sense yes, but in the other sense it is the same thing. There you would have drag and drop and now they would have dbt, but it is still the same all over the time. It is very narrow and you are going to be replaced anyway probably if that is the only thing you do. If your job is to sit there and wait for some business folks or data analyst to tell you what to do and how to do it and then you implement it in dbt, that role is a dead end. (19:57)

Slawomir: If someone accomplishes the course, they manage to get a job and they find themselves in a situation like that, I would say push yourself to start upskilling. It is fine for now, for a year or so, but that is a dead end and you need to do more than that. Otherwise that is a stagnation and replacement. (20:37)

Alexey: Let us talk also about this tech native and platform engineers. What are tech native companies? (21:05)

Slawomir: The tech native companies are the big four like Facebook and Google. But also all the companies where the fact that they are a technical technology company is making their products better. I would call Deliveroo tech native. Even though their operations are physical, so there is a courier who is delivering, the reason why they took over the world is because they use technology at a master level to make their services better. (21:05)

Slawomir: Those services can be physical but that is not a problem. Now compare this to, let us say, Coca Cola. Coca Cola has always been there selling beverages. They are not using the tech to the extent where the reason they have the tech departments is to drive the product. (21:51)

Slawomir: Maybe nowadays they are pushing harder, I do not want to be harsh on Coca Cola since I haven't worked with them, but you get the point. Those are the companies which came later in the game just because they have to use the technology. That is how the world worked. I think they tend to be a little bit less agile and advanced many times. (22:18)

Slawomir: They tend to have more management layers and tend to use a little bit more legacy code. I have worked with one client who was a really classic corporation and they still did the waterfall. They did not have the agile. They had the waterfall big planning and big everything. (22:35)

Slawomir: It is still there sometimes but that would not happen in the tech native company like any SaaS platform or Uber. (23:02)

Alexey: We talked about different types of companies and different technologies they use. We also talked about the two kinds of data engineers. So one is product engineer, which has a lot of overlap with the role of analytics engineer. They focus on business skills and analytical skills. What about platform engineers? (23:09)

Alexey: I assume the focus is more on the engineering side rather than on the business side. What kind of skills do they need and what is the day to day? (23:34)

Slawomir: Yes, definitely. Number one, which is very underrated, is going to be any sort of DevOps skills. They either are going to have to work a lot with DevOps or they will have to put all the deployments and CI/CD etc. themselves if there is nothing there. Platform engineers have to maintain the platform. (23:47)

Slawomir: The number two is of course all this kind of system architecture things. Once they have a warehouse, lakehouse, or data lake, they need to know what kind of features they come with and what kind of technologies, tools, and architecture look like. All the data architecture things they need to know as well. (24:12)

Slawomir: The other one is this kind of cloud engineering. Everything is in the cloud right now and you need to be able to operate within those environments. Then you will have some vendor specific stuff. Probably your company is going to be on Snowflake or on the Azure Databricks or whatever is out there. (24:34)

Slawomir: You need to have the exposure for this and the cloud providers. That is part of the cloud engineering. You need to add to that all the processing engines like Spark or Presto, whatever you use to compute to transform. Those would be the core for me. (25:03)

Slawomir: If you have this, you are well on the way. An extra addition on top of that which I think many times is missed, but gives you a big competitive advantage as an employee, is being cost aware. (25:24)

The Competitive Advantage of Cost-Aware Engineering

Slawomir: I would say that the big thing is that you are able to match your platform, the platform you are building and designing, to what your company actually needs. Doing it cost efficiently and being cost efficient in the process later on when you manage is important. The reason for this is twofold. (25:33)

Slawomir: One, cloud bills skyrocketing is a common thing in the industry right now and people are not cost aware. We have this thing that our cloud is cheap and storage is cheap but then we quickly realize it is not that cheap as you think. It adds up. (26:05)

Slawomir: The second bit is trying to build this behemoth platform for a company which is not at that stage. That is another classic joke. You do not go there. It is a startup and then you overengineer. (26:22)

Slawomir: And the flavor of over engineering here is saying we are now ready for real time and batch and we have this lakehouse thing. Now what are we going to do with that? We are going to ingest CSVs. So amazing; I have this Ferrari kind of platform which costs a lot and we are not using this. (26:37)

Slawomir: That is another common problem. So I would say that if you know all the skill sets I described and you want to get the competitive advantage, those would be other things you need to look for. It is a disease. It is surprisingly common across the industry to do both of those mistakes. (27:01)

Alexey: So when it comes to overengineering and being pragmatic, what would be your recommendation for startups? I am not Meta, I am not Google, and I do not have the petabyte scale. I just need to be able for my analysts to create dashboards. So there is some data being generated and I want to capture this data and I want to let my analysts use this data. (27:21)

Alexey: What kind of technologies or how would you recommend approaching it? (27:46)

Slawomir: First of all, do not spend millions. That is the big thing you do not have. You could do things like nowadays with the current technologies and compute where you could even have a database which crunches something which used to be crunched by multiple parallel processing systems. You really could get things like DuckDB, Airflow, and dbt. (27:54)

Slawomir: You could even go simpler and have just dbt plus a database plus some visualization and you probably can run far with that. Add to the top of that Superset; I think Superset is good open source nowadays. But you pretty much can get very far away even with the open sourced tools. And you do not need real time. (28:30)

Slawomir: You probably are super fine with daily batches or, if you need to go below that, that is fine, but you do not really need expensive tooling. You do not need expensive systems because if you are just a startup and you are just starting over, you need to mind yourself. You do not have that big data. Something which used to be big data is no longer a big data line. (29:03)

Slawomir: No one is even using big data anymore. It is a reminder from the past. We can crunch data even on a single instance and you can go far away with that. But you do not want to spend too much time on setting up all of this and maintaining all of this because every single complication adds up. (29:18)

Slawomir: Let us say let us have a real time. It is not a simple problem to get the real time. You can have a Kafka, but the real time processing is a different piece than batch processing. Do you really need real time right now? (29:44)

Slawomir: I am ninety nine percent sure that you do not need that. Even if there is a need for close to real time data, you can just run it every five minutes. (30:03)

Alexey: Exactly and it is much simpler. (30:11)

Slawomir: If you run it every five minutes, this is much smaller because the incremental is smaller. So it is just fast enough and it just makes things so much simpler. But I have seen companies pushing real time in so many strange ways. (30:18)

Alexey: At what point do I need to consider these big names like platforms like Snowflake or Databricks? Do I need them at all? Maybe I implement the platform myself? (30:33)

Avoiding Over-Engineered Platforms and Modern Data Stacks

Slawomir: I would not implement the platform myself probably because even if you are implementing it yourself with just open source, you are still paying for engineer hours. So it still has a cost. Probably cloud will not unless you are a social media influencer where anything is possible. (30:56)

Slawomir: But jokes aside, I would probably go at some point to a little bit more enterprise solution. But that is where the scale comes. If you have more and more data analysts and you have data scientists and you do bigger things, at some point you will hit the ceiling. The things will start to get clunky. (31:20)

Slawomir: Management will get a little bit harder and you are probably going to miss a lot of good features. So you will want to move at some point. But just do not move for the sake of moving if things are working fine even if you are in your basic setup and you can get the business value. That is kind of what too often people miss. (31:48)

Slawomir: At the end of the day, it is all about the business value being able to provide. So if you are a startup which is scaling and then things go like you have product market fit, the adoption is growing, you start to hire more, and you start to scale all the other functions, that is probably the moment you are going to also scale your data engineering. You start to think about moving the platform to something more enterprise grade. (32:04)

Slawomir: Before that I do not see the reason. What usually happened though is different. This is not a problem for startups honestly unless they got a huge funding. It is usually the problem the other way around where you have a mature corporation. (32:34)

Slawomir: They used to have some other departments covering for the platform like some IT and engineering and they were not really data engineers. They were maintaining things. Then you had a bunch of BI developers with some vendors and things kind of worked. But there is money in the company and there is a big digital transformation coming. (33:05)

Slawomir: They are saying they are now going to have this top level data platform. They have money, so they have a lot of money to waste and they tend to go all in. They say they are now just going to do a massive transformation. And I think that, from my experience, that is the moment where we sometimes failed to put the scale where it should be. (33:22)

Slawomir: We think we have big data and we need this or that and there are all those promises. There is a huge marketing machine behind all of this. The FOMO is really real; you think you are missing something if you are not running this. So I think that the risk is more on those companies which have money and want to go all in into the data. (33:48)

Slawomir: They think they now need to drop millions of dollars. Startups go from the other way around. They scale in luxury and they usually are safer. (34:05)

Alexey: Maybe these companies they are just used to spending a lot of money on SAP and similar stuff? (34:11)

Slawomir: Oh yeah. They will hire Deloitte or whoever is one of the big four players and they will give them a big check because they do big important work. (34:17)

Alexey: What would you say about Spark? Is Spark considered legacy? Would you use Spark for any new projects? People are still using PySpark. (34:30)

Slawomir: If I start a new thing, I do not know what scale I would have to have. I would go with either Presto, DuckDB, or anything simple. If I am just starting out, my mantra would be just make it as simple as possible. (34:40)

Slawomir: So I maybe am biased towards Presto because I was at Meta. That is why people are not so generous towards the Presto so much. But DuckDB is a perfect solution. Any others will do as well. (35:08)

Slawomir: I would say Spark is not legacy yet. I still see a lot of Spark. But both Spark and Presto still require all this big data core. (35:32)

Slawomir: You need a team of people for this, for both of them. It is going to be hard if you are a team of one or two data engineers pushing that. I would not go with neither. I would go simpler or use a vendor solution which gives you the tool. (35:41)

Slawomir: I like Spark, I use Spark a lot, and I teach Spark in our data engineering course. But I think sometimes the cost of owning the cluster is huge. Then another thing with Presto is the same thing, but in AWS there is Athena which is nothing else but managed Presto. This is convenient. (36:01)

Slawomir: I was just curious to know what is your opinion about Spark. I recently spoke with another data engineer and he confirmed that for them Spark is still there and they still use it, but for new stuff they wouldn't necessarily use it. (36:27)

Slawomir: I think the question here is whether you buy or build. If you want to own your infrastructure and you have a team of people and very specific reason why you want to do it that way with Presto or Spark, you can do it. It is still there and people are doing this. But you are one hundred percent right that the cost of management is higher than you think. (36:44)

Slawomir: Managing those clusters yourself is a lot of work. If you just buy something, I think on the AWS or on GCP they have their own solutions. I would probably go with those solutions, especially since you probably do not have a big team. If it is up to me and I am setting up something for my startup, I would go with those vendors. (37:08)

Slawomir: I would not try to set up Spark myself. (37:31)

Alexey: Would you set up Kafka? (37:38)

Slawomir: Unless I really need real time, no. Kafka is amazing. Don't get me wrong. If you really need real time, go with Kafka. (37:38)

Slawomir: But I struggle to see many real time use cases for the kind of classic advanced analytics. (37:52)

The Real-Time Myth: When to Use Kafka and Spark

Alexey: What about fraud detection? (38:01)

Slawomir: That is a very specific fintech case. If you have a transaction coming in and you want to check every single transaction, go real time. That is a great use case. But how many startups are there doing the fraud detection? (38:07)

Slawomir: Not many. Real time recommendation is another use case. Let us say you want dynamic pricing or you want something super dynamic then yes. (38:26)

Slawomir: But now the big question is whether we are doing the classic analytics or are we doing something in between of AI engineering, machine learning, and software engineering. (38:34)

Alexey: It looks sounds more like software engineering to me? (38:45)

Slawomir: They will involve the analytics teams especially if the machine learning folks are on the analytics side in that given company. But when you think about just reporting, dashboarding, and analytics, it is hard for me to find a single case for real time. If we look more broadly at advanced analytics, ranking, fraud detection, machine learning, and AI, I start to see the point there. (38:48)

Slawomir: But we are going very specific here. My claim is also that if you do not know what is your revenue and you do not have basic reporting and basic analytics, and you do not know what happened in the past, is it really the time for you to invest in real time insights? You do not even have the insights from last week. (39:15)

Slawomir: Maybe real time insights is not the thing. But if you need real time I would go with Kafka. Definitely Kafka is the industry standard. There are a lot of things built around Kafka. (39:39)

Slawomir: Everything speaks Kafka. So if you need real time I would go with them. (39:54)

Alexey: We have quite a few questions from people who join us. So I want to start with these questions. One of the questions is something we briefly touched when I asked you about these traditional companies and whether it is good to work for them. You said let us set aside the market conditions for now. (39:59)

Alexey: But if we don't set aside the market conditions now, first of all what are these market conditions? How tough is it now? Especially for juniors because in our courses we have a lot of people who are switching their careers. They are not necessarily juniors, but we have those too. (40:30)

Alexey: They already have some background, not always in software engineering, but they have some background and now they want to switch careers. How tough is the market for them right now? (40:49)

Slawomir: I can tell you only the things which I know because of my social media presence. I talk to a lot of people and many people come to me with the problems of getting a job. The problems I keep hearing is that it is rather tough. It is hard to get even through the CV screening. (40:52)

Slawomir: My understanding right now, and of course you can see less job postings, I see that myself. I see less junior roles and I see less posting about the job. People report to me that getting the interview itself is a big success, not mentioning going through the interview. So my understanding right now, every report I have seen and every person I talked to says the market is rather tough. (41:12)

Slawomir: So that is why I set this aside. Now there is a different way I would treat it for people who are already within the industry within data related positions. Let us say you are a data analyst and you want to switch to data engineering. You are in a way better position than someone who used to be an accountant and knows Excel. (41:43)

Breaking into Data Engineering: 2026 Market Reality.

Slawomir: The question is from a person who is a civil engineer. Civil engineering is a good point because they at least have the technical background. What is civil engineering? Construction, designing, and calculating structures like roads and bridges. (42:08)

Slawomir: So you are good at math, you are good at physics, you are an engineer. You are really an engineer, so that is a good point. It is still a little bit weaker than actually being in the tech industry and having an analyst or software engineer role. But it is still way better than a lawyer trying to get into data engineering having zero engineering background. (42:40)

Slawomir: The tiering for me would be the person completely outside engineering trying to break in; I am not jealous in that situation. Then people trying to get into it with some engineering background like robotics, math, whatever; it is going to be hard as well but it is better in the sense that they have the formal training and they are engineers. Then I would say it is a little bit easier for folks who are already in the data industry as data engineers, data analysts, or software engineers. (43:09)

Slawomir: Machine learning engineers usually do not move since people want to get into the machine learning and AI side. But if you are there, you have the slight advantage because you already have a job within the market. My advice here would be to reuse your current role to extend your responsibilities. A classic example would be if you are a data analyst and you want to move to data engineering. (43:50)

Slawomir: Why don't you use your current role and extend the responsibilities? Instead of a data engineer building a pipeline for you, you do it yourself and you ask your data engineer counterpart to review your work or help me. You could do these things when you are already in the industry. That is a big advantage. (44:06)

Alexey: I have seen multiple people in my previous company do exactly that. They were analysts. Analytics and data engineering are very close. (44:29)

Alexey: I saw a few very successful cases where people did exactly what you said and now they are lead engineers. (44:44)

Slawomir: But now for the scenario of the person outside the industry trying to get into it, my advice here would be to try to get a job. I am going to be very brutally pragmatic: get a job. Your journey will probably be longer than other people because you do not even know if you are more on the platform side or the product side. (44:52)

Slawomir: Maybe actually you are going to love analytics engineering kind of thing. So you are just starting out, so get a job and then try to build from there. That would be my advice. Unless you really know that you are passionate about platforms and building data platforms, then narrow down your search and narrow down your learnings towards this. (45:16)

Slawomir: There is no way you are going to learn everything end to end, all the frameworks from both sides of the data engineering. There is just simply too much. Companies are guilty of throwing a bunch of random requirements out there like dbt, Airflow, and all of that. Then you go to the job and you build a dashboard. (45:45)

Slawomir: So narrow down. You won't learn everything. If you are outside the industry, it is going to be harder. It is doable, but I personally would just try to get a job within the related title and keep this as an anchor and extend from there. (46:08)

Alexey: By related title, you mean like data analyst? (46:30)

Slawomir: I would go rather with data engineer. If I am really desperate, I could go that way. But that depends really on the personal situation and how desperate I am. (46:30)

Alexey: Is it actually a good strategy? Because people come to me with this advice. It doesn't always apply to data engineers, it could be ML engineers. (46:50)

Slawomir: The idea is that there is a stepping stone like a different role which is easier to get in first and then you use this role and work there for a couple of years and then transition to your target role. Like for data engineers, if you are already a software engineer, it is way easier to transition to data because they already have the skills. For data analysts, they are already very exposed to the data stuff, so for them the transition is also smoother. (46:51)

Alexey: Would you recommend actually using this strategy of first getting some other role or try to go directly? (47:27)

Slawomir: My answer would be that it is not possible to answer it without knowing the person's situation. Let us not forget what is your age? Do you have family? You only have so much time. (47:35)

Slawomir: If you need stability and I need the role faster and I have some time to push after hours or within the role, I could advise that. Because it is longer but there is this kind of transitory period where you could manage your finances and you could manage the transition smoother. It is possible and I could take it into account. But if you are not desperate and you want to spend more time on extra learning and extra projects, then do not do it because it will take your time. (48:03)

Slawomir: If you can spend one month just doing super hardcore curriculum and no work, you can learn everything. You go to your community, you learn your stuff, and in two months you can get an ML job. But if at the same time you took a transitory role, half of your day is that role. You will only have a certain amount of time. (48:49)

Slawomir: So there are pros and cons. I would say look at your personal situation. If you can spend time learning and pushing harder without distracting yourself with transitory roles, do not do it. But if your situation requires it and your background is completely different and you need to have a job, I would consider that. (49:13)

Alexey: What I usually think about is that data engineering and data analytics are actually quite different. At the end, what data analysts do is very different and the things you need to learn for data analytics are very different. SQL is common and data understanding is common, but the rest could be broad. (49:41)

Alexey: So maybe if your target role is data engineering, it doesn't mean you'll enjoy being a data analyst? (50:05)

Slawomir: They suffer. I mean, they are two different roles because of a reason. So that is why I am saying it really depends. I know people who just want to get to tech or just want to get to the data. (50:12)

Slawomir: If you have a really honest discussion with them, they are saying anything will be better from what I am doing right now. I don't care if it is data engineer or data analyst, just give me a data position. If you are in this situation, that is a completely different discussion than if you are a software engineer who has an opinion about what they like and what they want to do. (50:28)

Alexey: And I see there are like five questions that are about the same thing. I will group them. There is a lot of concern about an AI invasion. (50:54)

AI Automation: Why Strategic Builders Outlast "DBT Monkeys"

Slawomir: Does it even make sense to try to become a data engineer now when AI can do all this for us? That is a simple one: yes. The caveat is this: if your data engineering role is what we described earlier as this kind of corporate factory model where you sit and you implement dbt all day long, you should actually upskill because that is going to be taken away. (51:04)

Slawomir: There are many things AI will take away. There are the tasks which are right now done by data engineers which won't be done by data engineers because they are trivial tasks. Building dbt models, text to SQL, answering trivial analytical questions; these will go. There is the whole genre of data engineering which is data ops. (51:34)

Slawomir: Triaging failures where a pipeline fails and you don't know why it failed; some companies have data ops positions that go there and look for the fix. It is like a plumber fixer which goes there and looks for the problem. AI can do this probably. AI can rerun the things and can understand the logs. (51:56)

Slawomir: Sooner than later AI will be able even to push a PR with the fix for the failure. So those roles are doomed. But that doesn't mean that the other opportunities are not there. Platform won't build by itself. (52:28)

Slawomir: Even when you build this platform and there is AI, the better AI is integrated with the platform, the better for you. It needs context, it needs semantics, it needs understanding of your company and your data. Ideally it is well classified and there is metadata. You can integrate your models and you can do all of those things. (52:43)

Slawomir: And it won't magically appear there. So who will be doing all of these? Most probably data engineers. If I am integrating all the fancy agentic AI into my data platforms, those are going to be platform data engineers who are doing this. It is not going to magically appear there. (53:06)

Slawomir: Then AI needs to work on something. The cleaner the data and the more structured the data are there, the less work AI needs to do to infer. The better it will work. I will give you a classic example. (53:24)

Slawomir: I have companies which have integrated vendor solutions for AI to do conversational analytics. They ask what is the revenue and the AI tells them, but the revenue number is wrong because the data sucks. The data modeling sucks and they have just a mess. You put this AI on the top of the mess and someone needs to clean the mess. (53:42)

Slawomir: So there is a lot of work for data engineers with the caveat that you need to be this kind of versatile data engineer and not a dbt monkey or a triaging person who just checks what failed and restarts. That kind of data engineering is at risk. That won't be there. (54:05)

Alexey: I guess it applies not just for data engineering? (54:21)

Slawomir: That applies to everyone. Data engineers are not so special. That applies to software engineers and that applies to data scientists as well. The more trivial and more repetitive, the easier it is to automate it. (54:29)

Slawomir: If as a data scientist all you do is tune parameters, we don't have good news. That is exactly the same situation. (54:41)

Alexey: So a few questions about the interviewing process and hiring. You said getting an interview these days can already mean success because you speak with people and people share their stories with you. So how do I maximize my chances of getting an interview? Should I improve my CV? (54:51)

Alexey: And if yes, how? What kind of projects can I work on? Do I need certificates? What would your suggestion be? (55:12)

Slawomir: Definitely, I am assuming you don't have a solid network. There is the advice that you should network and your job will probably come from the network and I fully agree with this. But the problem with networking is that you already had to network before. If you now want to get a data engineer job and you never networked, the first thing you need to do is start networking. (55:28)

Slawomir: If there is one career mistake I did, it is that I started networking way too late. I got my Meta job because a person I knew really well went to Meta and they recommended me. When I switched out of Meta and my first consulting gigs, they were through my network. I could have had full time jobs which were not even posted on the website because I knew the people. (56:01)

Slawomir: If you have the network, you are going to get your job from the network and you don't need to worry. But most people don't network. So what you need to do now is you need to definitely polish your CV and you need to start sending those CVs to recruiters. (56:34)

Alexey: How do they polish my CV? (56:51)

Slawomir: First of all, the CV needs to be clean and nicely formatted. I would definitely focus on the outcomes. What the hell are you doing? For example, if I screen a CV and you tell me that you know how to build pipelines, that is probably not that big of a competitive advantage. (56:53)

Slawomir: If you tell me that you proactively built something to reduce the cost or whatever, that is already a big thing. Show the traits companies are looking for. (57:24)

Portfolio Strategy: Framing Side Projects for Maximum Impact.

Alexey: What if I haven't built a pipeline that reduced cost? What if I am just doing my personal projects? How do I present them in such a way that it is attractive? (57:35)

Slawomir: First of all, that is already a good thing because many people don't do even personal projects. But if you don't have real job experience, real examples are the strongest evidence. Level down is your personal projects. They are not as good as the real examples from the work, but they still count a lot. (57:44)

Slawomir: Then tutorials and certifications are the weakest argument. If you don't have real work, go with the side projects and advertise them. Don't get insecure about these. If you don't believe they are strong enough to show your skills, how will someone else feel about them? (58:15)

Slawomir: If you have a strong side project, present it as a strong side project and put some marketing hat on your head and fight for yourself. (58:38)

Slawomir: The mistake I see sometimes is people doing their personal projects and almost apologizing for the fact that this is a side project. It's like saying it's just a side project I did. Don't apologize for the fact that you spent time to actually do this side project. Show the best of this side project. (58:45)

Slawomir: Don't be shy on this and actually advertise this. It is not obvious. Not many people actually do this. What most people do is they watch tutorials and they repeat the things done in the tutorial. (59:10)

Slawomir: If the tutorial was how to build a dashboard using this or that, they will build a dashboard which looks almost like the one from the tutorial. But if you have a real side project which you came out with yourself and extended, you could be proud of yourself and present it that way. Side projects definitely framing business impact. (59:24)

Slawomir: Framing what changed on your CV instead of describing what you did is another thing. Run your CV through different people so they look at it and they see if they even understand it. I remember seeing one of the CVs and I couldn't understand half of it because it had abbreviations and acronyms. (59:56)

Slawomir: I was like what the hell is that? After talking to the person, we realized that this is a great CV for that given industry because in that industry everyone knows what those metrics are. But if you want to get something else, just make your CV understandable. I don't know what those numbers are or what those acronyms are. (1:00:11)

Slawomir: So that's another thing. Then just try to network at the same time and try to push your CV. Try to apply at the same time and build as many side projects as you want and prove yourself. Will it magically work? No, probably you still have an uphill battle but you are increasing your chances. (1:00:50)

Alexey: Do you have time for one more question? (1:01:07)

Slawomir: Yeah let us go. (1:01:10)

Alexey: So we talked about optimizing CV. What about projects? How can we optimize the projects we choose because there are infinitely many projects I can pick up from? Will picking up some particular projects be more beneficial for me? (1:01:16)

Slawomir: This is why I am talking about data engineering and different specializations. The clearer you are with what you want, the better for you. Let us say that I will give you my personal example. I knew that I would have a referral to Meta and I knew that I need to have stronger data modeling skills for Meta. (1:01:35)

Slawomir: I was heavily on the platform side and I didn't have much exposure to data modeling. I knew that there is a project coming in to remodel our data warehouse at my current company. I said I am going to do it with the external contractors because they were hired for that and I'm going to do it extra hours. I just asked them to pay for a course for me and I will learn and put extra hours. (1:01:58)

Slawomir: So I was working more but they paid me for training and I had real exposure. That is an amazing side project to take on. I was working more but I was very clear about why I am taking this project. I was very clear what I want here. (1:02:33)

Slawomir: If you are like I don't really know what to pick up, probably you're not clear enough what you want to do. What you can do is either pick anything and see if you like it, or if you don't like it switch to anything else. Or spend time figuring out what you want to do. Now if you know what you want to do, let's say I want to go to data engineering. (1:02:40)

Slawomir: I'm not in data engineering, I want to go to data engineering and I want to be on the platform side. That is already quite specific and then you can see what kind of things you will be doing. You are going to be building data platforms. So an obvious choice for your side project is to build a toy data platform. (1:03:08)

Slawomir: What is a data platform? Data platform would be the entire ecosystem of tooling around processing, storing, and using data. I have some incoming data and my goal is to have a dashboard. Everything is what I would call a data platform. (1:03:23)

Slawomir: Probably some people from the industry will shout at me that a data platform is just Databricks or Snowflake. For me it is the entire ecosystem that integrates the data, brings data, stores data, translates data, and exposes it to the user. (1:03:41)

Alexey: I misspoke I said income but this could be an actually good example? Let us say I get some data from my bank account with all the transactions. This is my CSV file and my goal is to have a dashboard where I understand my spendings. (1:04:06)

Alexey: Everything that is in between these things could be all this processing could be this platform. And if I use the example you had, DuckDB, dbt, Superset to glue? (1:04:24)

Slawomir: Exactly. So this is already a small platform. It's like all the pieces you will do as a data engineer. (1:04:33)

The Ultimate Portfolio Project: Building End-to-End Platforms

Slawomir: It is connecting to some sort of APIs or getting scraping the data that simulates you getting data from upstream systems. You are going to have cleaning of data because probably whatever you get is going to be dirty and unusable. So you have to clean data and you need to have the data quality checks as well. (1:04:42)

Slawomir: Then you need to store it somewhere and you need to store it in a nice form. You probably want to do some data modeling. Another kind of check box. It may cost you something if you are willing to pay some money, but if not, that is still fine. (1:05:04)

Slawomir: And then you want to do something with that. So you build these dashboards and you can even simulate getting business value out of it. You can actually have some sort of analysis or some sort of use case behind it and you can have this end to end flow. It is a really valuable end to end project. (1:05:15)

Slawomir: It touches every single point of a data life cycle. You are learning core skills. It's a big one and you are going to learn a lot. That is a great side project. (1:05:39)

Slawomir: And maybe if you are like me and every year you have to report taxes? I assume most of us have to do this, especially if you live in Germany. So every year I have to go through the same process where I need to go to my bank accounts, multiple accounts, they are different. I need to understand what was my income because I need to declare my side income. (1:05:53)

Slawomir: I need to understand how much and what were the transactions with this side income and where exactly it is coming from. If I had a system with a report at the end, I think this would be like a perfect side project. It is amazing. It is very simple and it solves my problem and then it's a portfolio project. (1:06:15)

Slawomir: One thing which I like in this project is you actually personally will benefit and care about this. I'm talking just from simple human psychology. If you have to force yourself to work on something which you don't give a think about or it's kind of very abstract, you probably will have less motivation to push hard on this. (1:06:31)

Alexey: But if you are solving something for yourself or something you are passionate about, you will get this thrill of working on this and you going to you you will want to do it. So it is also much easier to push those kind of projects. Okay cool. Well, I don't want to keep you longer. (1:07:05)

Alexey: We still have a few questions but I think some of these questions are not really on topic for this discussion. You can ask these questions in our slack and I'll be happy to give an answer but with that I want to thank our guest. Thanks a lot for joining us today for sharing all your experience with us. (1:07:13)

Slawomir: Thanks everyone for attending too. I really enjoyed all the questions you had. It was really lovely discussions. (1:07:25)

Alexey: Thanks Slawomir. (1:07:32)

Slawomir: Thank you very much for having me here. If anyone has any more questions they know how to reach me. If they don't they just put my name on LinkedIn. I'm a real person and I always say I do talk to people. (1:07:32)

Networking Advice and Local Gdansk Culture

Slawomir: I told you about networking. I do network and I do like networking. So if you have questions reach me out. (1:07:49)

Alexey: Do you go to attend any meetups in Gdansk? (1:07:58)

Slawomir: I haven't found any yet. I need to find what is the scene. I am still in Poland for just my second month I think. So I still don't know the scene. I don't know the things here. I am still in Gdansk. (1:08:00)

Slawomir: That's the plan to stay here longer. If somebody is from Gdansk and wants to hang out, definitely reach me out. Anyone from Poland, reach me out. I don't know anything, so you will be of help for me as well. (1:08:08)

Alexey: Pierogi; I really like pierogi. It is my favorite. (1:08:21)

Slawomir: Dumplings are amazing. (1:08:26)

Alexey: Okay. Thanks Slawomir. Thanks everyone. And it was really nice chatting with you. (1:08:26)

Slawomir: Thank you very much. My pleasure. (1:08:32)


DataTalks.Club. Hosted on GitHub Pages. We use cookies.