Becoming a Data-led Professional

Did you like this episode? Check other episodes of the podcast, and register for new events.

Transcript

The transcripts are edited for clarity, sometimes with AI. If you notice any incorrect information, let us know.

Data-led academy
Arpit’s background
Growth marketing
Being data-led
Data-led vs data-driven
Documenting your data: creating a tracking plan
Understanding your data
Tools for creating a tracking plan
Data flow stages
Tracking events — examples
Collecting the data
Storing and analyzing the data
Data activation
Tools for data collection
Data warehouses
Reverse ETL tools
Customer data platforms
Modern data stack for growth
Buy vs build
People we need to in the data flow
Data democratization
Motivating people to document data
Product-led vs data-led
Wrapping up

Alexey: This week we will talk about becoming a data-led professional. We have a special guest today — Arpit. Arpit actually is one of the first people who joined DataTalks.Club. I think you were one of the first 10 or 20. I remember asking you for some tips. I checked your LinkedIn and it was something about growth and community. I immediately became interested and started to ask you different questions about growing the community. Arpit is also the founder of the data led academy, which is the go place for anybody interested in data, who wants to learn how to work with data. If you want to ask any data related question, this is the place to look for answers. Welcome, Arpit. (2:21)

Data-led academy

Arpit: Thanks Alexey for the intro. I’m excited to be here, thanks for having me. DataTalks is a really great community, I’m excited to be part of it. Just to let people know that data academy is not exactly a community, it’s not a slack community. Essentially it’s a place to learn how to work with data. We have a lot of free learning content. We're also creating a repository of common questions about tools, technologies, people and processes related to data. You can go and find answers to those questions there. I’m excited to be here, thanks again. (3:28)

Alexey: And you also have a podcast. (4:06)

Arpit: We have a podcast called “the data led professional”. You can check out the podcast on our website at dataled.academy/podcast. We talk about different data related topics that mostly are relevant for less technical people — people who are not exactly data engineers or data analysts. Of course, they can also benefit from the content but our core audience is people working in product growth and operations roles or marketing. They want to learn about data and different data related topics. Typically our goal is to answer common questions that people have about data. (4:09)

Arpit’s background

Alexey: Before we go into our main topic of becoming data-led, maybe we can start with your background. Can you tell us about your career journey so far? (4:54)

Arpit: I’ve been working in the technology industry for pretty much my whole career. I’ve worked in different types of companies. I got into the data space when I was working as a consultant. I was building a lot of integrations for SMBs and that led me to Integromat. I was a user of Integromat. It's a work automation solution like Zapier. I was one of the earliest people who joined the Integromat team. I built the integrometric community and then eventually I led growth at Integromat. I moved on from Integromat soon after it got acquired last year. Since then I worked with a few other companies in the data space, solving different problems in the customer data infrastructure space. (5:06)

Arpit: Now I am building dataled.academy. I’m also helping a few data companies with their content and community strategy. For me, content communities is all I’ve been doing. At Integromat, the content community is what helped us grow really fast. It’s really important for data companies to build their presence across different communities where their prospects, customers and partners hang out — communities like DataTalks.Club. A lot of people are building new communities. But I believe that there are already a lot of great communities. That’s why I like to be active in existing communities. That’s the idea behind dataled.academy — not to build another community, but to create a place where people can get concrete, actionable and unbiased answers to their questions. A lot of these answers are answered by experts who are from these communities.

Growth marketing

Alexey: Growth — this is something you did within Integromat. I was curious what growth managers actually do. When I checked, I found out that it’s very data driven. They use a lot of data. When I look at “growth marketing”, it doesn’t sound too data related to me. But when I actually read about it — they run a lot of A/B tests, they need to make a lot of decisions. All these decisions are based on data. I guess this is how you ended up creating this dataled.academy. You saw how it is useful. (7:21)

Arpit: Yeah. Without data, there is only so much you can do. If you don’t have data when you are building growth experiences — whether it’s for acquisition, activation or retention — without data you’d be forced to build linear experiences. Every customer, irrespective of their behavior, will go through the same path. But when you have data, you can create personalized experiences. The customers are prospects, they’ll see content or interact with your product based on where they come from, what their industry is and how they interact with your product. (8:09)

Arpit: That’s the main difference. It’s very hard for growth professionals or product managers to build personalized customer experiences across different touch points and channels without having access to data in the tools that they use. It’s not enough just to have access to data in a data warehouse or a BI tool. It’s more important to have access data in the tools that they use — to build and craft these customer experiences.

Alexey: Tools that marketers use to make different decisions? Most of them cannot just go and run a SQL query in a database. (9:28)

Arpit: Yeah. Marketers, growth professionals, product people, operations people. They don’t need to know how to write SQL. Of course, if they know it, it’s an advantage. But there are a lot of great tools out there that allow you to visually query the data. If the data is made available, they can easily use the data for whatever they are doing. Whether it’s creating in-app experiences or creating lifecycle email campaigns or doing A/B tests like you mentioned, or doing an SMS campaign. Irrespective of the channel that they are using to engage with customers, they can use this data, if the data is available in the tools that they use. (9:46)

Being data-led

Alexey: What is data-led? I think you have a definition. What is that? (10:38)

Arpit: A data-led professional is someone who understands where data comes from and what it looks like. They are able to question its accuracy, not just blindly believe the data that they see. If they understand where data comes from and what it looks like, they will be able to question its accuracy. They are comfortable working with data and they have the skills to build their experiences powered by data. It sounds like a lot, but you don’t really need a technical background to know this — to understand where data comes from, what it looks like, to question its accuracy and to work with data to build data experiences. (10:45)

Data-led vs data-driven

Alexey: Is there any difference between being data-driven and data-led? Data-driven is something I hear quite often. To be honest, I still have no idea what it actually means. What is it and what are the differences between data-driven and data-led? If there are any differences. (11:33)

Arpit: It’s a buzzword, but I like to think that being data-driven means to base decisions exclusively on available data. Every company wants to be data-driven. They are investing heavily in their infrastructure. Once data is available, they want to be data-driven. They don’t always question the accuracy of the data and use intuition and experience before making decisions. Data is all about making decisions. Good data helps you make good decisions. But you cannot always blindly follow what data tells you. There are so many ways the data that you see can be inaccurate. There are so many data quality issues. It’s important to combine your intuition, your experience along with the data that you see. (12:00)

Arpit: So when you’re data-led, data is leading you or guiding you rather than just telling you what to do — rather than just blindly following what data tells you.

Alexey: So, a data-led profession is somebody who knows and understands where the data comes from, can question its accuracy and is comfortable working with data. The second point about questioning is the main differentiating factor between these two. (13:09)

Documenting your data: creating a tracking plan

Arpit: Yeah. You can only question this accuracy if you understand where data has been collected or how it is being collected. If nothing is documented, you cannot really understand anything, and you cannot really question it. It’s important to have proper documentation on your data sources. (13:34)

Arpit: In the context of product and growth, we need to have a data tracking plan. It could be a simple Google doc or a Google sheet. Or there are purpose-built tools to create your tracking plans. In these tools, you can define every event that has been captured, related properties that are being captured and even the data types of each property. When these things are well defined, any product or growth professional can look at that information and understand that “okay this event is captured when someone performed this particular action”. “The sign up event is captured when someone not just submits the sign up form — but when the sign up is completed rather than just being a client-side event versus being a service side event.”

Arpit: When you specify these things, you have enabled people to understand where data comes from. It's important to document everything to be able to pass on that knowledge to others. The person who would implement it cannot be around forever to explain where the data is coming from. But once you have that understanding, then you are able to question the accuracy. When you see an anomaly, when you see that the data doesn’t look right, you can go and drill down and figure out what the issue is — instead of flying blind or blindly trusting what you see.

Alexey: This data tracking plan is the document which describes all these events. When do you need to do this? Let’s say we work for a startup or for a mature organization. We already have some data sources. Then we hire a growth marketer. Now that person needs to make decisions based on data. (15:25)

Arpit: It depends on the stage of the company. I have seen some big companies not having a tracking plan at all, which is a problem. Every company that has a tech product or a tech-enabled product or even an e-commerce business needs to have this documented. It could be a tracking plan. Some companies use tools like Miro for that. (16:01)

Arpit: It’s usually done even before you set up instrumentation of data — before you set up a product analytics tool, or any tool that depends on event data. You instrument your product events. The process of tracking product data is often referred to as “instrumentation”.

Arpit: There is no rule when it’s done. Typically it’s done when companies are ready to invest in product analytics tools or other event based engagement tools, like Customer.IO, Braze, etc. There are many tools where you can use events to personalize customer experiences. This is relevant to both startups as well as big companies.

Arpit: At a big company if you do not have this stuff documented, new people will come in and they will have no idea what to do. They see a bunch of events. They are supposed to create some in-app experiences or some email campaigns. But they wouldn’t know what an event means. Or they might see conflicting events. They might see an event called “signed_up” and another event called “SignedUp” with a different casing. All of those issues come up if you do not have things documented and well instrumented.

Arpit: I’d recommend every company to do this sooner rather than later. Of course you need to have customers first, you need to have users to be able to make sense of this data.

Understanding your data

Alexey: You mentioned growth marketers. But there are a lot of different users of data. We have data analysts, product managers... Many people need to make decisions based on data. They need to analyze the data and they need to understand each event — what’s the origin of this event? If they see something strange in the data, they need to be able to question it. “There’s a spike in registrations. What does it mean?” And then they would be able to go and drill in and understand what’s happening. Right? (18:27)

Arpit: Absolutely. An example would be — you see a ridiculous spike in your signups. You know that it doesn’t look right. If you have the event instrumented properly and you are also capturing relevant properties with that event, you can figure out where the signups are coming from. Oftentimes you might realize that these are fake signups. (19:12)

Arpit: We talked about product, growth and product marketing, but these events are extremely useful for sales people as well. Sales people look at a CRM. They look at data about a company or a prospect. They can have more context about customers in the trial period if they see what they are doing inside the product. They can go after the right accounts rather than going after everybody. They see that this particular account has five users. They have already performed a bunch of actions. If it’s a project management tool, they have already created a project and a bunch of tasks. Now would be a good time to reach out to them versus just reaching out to everybody who shows up in your CRM. It’s extremely useful for sales.

Tools for creating a tracking plan

Alexey: It’s very important to document your data. You mentioned that you can do this with Miro. Or things like an Excel spreadsheet or Google spreadsheet. Are there any special tools for that? Or people use whatever they are comfortable with? (20:47)

Arpit: There are purpose-built tools for that: AVO, Iteratively, TrackPlan. These are tools that are built for companies to create their tracking plan in a collaborative manner rather than relying on a spreadsheet. They have useful features to maintain data quality, maintain taxonomy and collaborate on each event. You describe an event and the developer who is supposed to instrument it might have questions about that event. You can discuss it and these tools are really useful. (21:16)

Alexey: You said “a developer is supposed to instrument this event”. The way I understand it, first you create this tracking plan: you write down all the events you want to capture. It doesn’t mean you capture them yet. But you want to start tracking them. Then an engineer needs to implement this to start tracking this data. The data doesn’t just appear magically on your dashboard. You actually need to capture this data. (22:04)

Data flow stages

Alexey: Now we have a tracking plan. We tell the developer, “can you please instrument this event?” We start collecting them. Then we have the other end of this — an analyst or somebody is looking at the dashboard. Then make a decision like this variant of the campaign is better than this variant of campaign. There are a lot of things that happen between these two things. (22:50)

Arpit: Data collection is the first step. Even before, let’s talk about a startup that doesn’t have any data infrastructure in place. Let’s think of a SaaS product — could be a project management tool or an invoicing tool. Before you implement any product analytics tools or any tools that rely on data, you need to first create a tracking plan. You need to describe all the events that you want to capture and then describe all the properties of these events. You also need to describe user properties and organization or account properties — all the different pieces of data that you ideally want to collect. (23:27)

Arpit: When you start doing this, you feel like you want to collect everything. You might end up with 50 events. This is great. The next step is to remove all the events that you don’t need in the near future. Having too much data at the beginning is one of the biggest issues. It takes more time to implement, it takes more time to test. So, bring it down to seven or ten events that you really need to understand the customer journey from acquisition to activation.

Tracking events — examples

Alexey: Maybe you can think of some examples? So you said, a SaaS product which could be a project management tool or account or invoicing. (24:43)

Arpit: You start by tracking your sign up event. I recommend people to track the “email verified” event — depending on your sign up flow, you might have people who sign up but don’t verify their email. It’s useful because you might want to create some emails based on that event. (25:04)

Arpit: If it’s a project management tool, one of the first things is to create a project. Then “invite a user”, because a project management tool is no good if you don’t invite another user to work with. So, “project created”, “user invited” — those would be the core events that you track. Then, of course, you want to see whether they create a task, so “task created”.

Arpit: If it’s an invoicing tool, it could be “invoice created”. With an invoicing tool, you probably add a client. So, it could be “client added” or “client created” and then “invoice created”. Then you invite your colleagues, so “user invited”. These are the most common events. You start with these.

Arpit: For each event, you would describe relevant properties. For the “sign up” event you describe the user's name and email. If in the signup form you ask which industry they belong to and what their roles are, you want to capture that information alongside that event.

Arpit: Once you are happy with your events, then you bring in an engineer. You discuss it with them, get their feedback. Engineers will have a lot of good feedback. You get their feedback, polish your tracking plan.

Collecting the data

Arpit: I recommend that for every event you describe if it’s a client-side event or a server-side event. That is really important. It’s better to track server-side but you might want to track some client-side events. (27:00)

Alexey: What’s the difference? For a sign up with client side or server side. (27:18)

Arpit: It depends how you end up implementing it. If you are tracking a client side, the event will be fired as soon as someone clicks the “sign up” button and submits the form. But if you are tracking the server side, it’ll only be fired once the signup process is actually completed, once that user is added in your database. A lot of times a user will click that sign up button and there will be a validation error that your password is not right or email is not right. But the event will be fired. So, you might see a sign up, but an actual signup has not taken place. (27:23)

Arpit: Ideally you would track it server side, but for some use cases, tracking client-side events is better if you want to track if someone clicked a button. This is useful, if you want to see if someone tried clicking on a button to use the feature. Even if they don’t use the feature, you want to know that someone actually tried using it. That’s very useful information for you, so that could be a client-side event.

Arpit: Once you have that, you start working with your engineers. Multiple engineers might be involved. You should even specify which engineer is going to implement which events. You can specify who owns which event. Once you have everything done and once you actually have data flowing in, you are actually done with the collection stage.

Storing and analyzing the data

Arpit: The next step is to make sure that this data is stored in a warehouse. For early stage startups it might not always be possible to set up a warehouse. But it’s not hard to set up a warehouse today, it’s very affordable. (28:52)

Arpit: But you need to store event data or product data somewhere. If you don’t, you would obviously be sending it to some product analytics tool, like clicky, mixpanel or amplitude. But you wouldn’t really have access to this raw event data to be used in the future. So, you should store it.

Arpit: Then you want to analyze this data. Typically event data is analyzed in a product analytics tool. Some people end up doing this in a BI tool. BI tools are not purpose-built to analyze event data. You’d need an analyst to write a whole bunch of SQL queries to create a simple funnel report in a BI tool. But you could do it in a product analytics tool with a few clicks.

Data activation

Arpit: Irrespective of the tools you are using to analyze the data, you derive some insights from the data, and then you want to activate that data — act upon that data. That’s the most important thing. You cannot just look at data and be happy with it. You need to do something about it. That’s when data activation comes into place. Once this data is available in your activation tools, your email tools, your support tools. (30:03)

Arpit: Support is a really good use case — it’s something that a lot of people don’t think about. If you make this event data available in your support tools, you enable your support teams to see what users have done in the product. When somebody opens a ticket, they don’t have to ask people “did you try doing this?” or “did you try doing that?” or “can you try doing this?”. They can see what users have performed. So, they don’t have to ask you and can provide better support.

Alexey: Are there companies who actually have this? All my experience with customer support was... (31:02)

Arpit: More and more companies realize the importance of this. It’s not a priority for most companies, but there are companies that provide great support. They have this and it’s really not that hard. Once you have the data, you send it to ZenDesk or whatever support tool you are using. That’s the thing with data. Companies have access to this data, but only 5 out of 100 actually use this data across different channels and make this data available to different teams. (31:12)

Arpit: It’s also useful in sales. People don’t generally have access to product data in the tools that they use. But now more and more companies realize the importance of this. Now there is a new breed of companies, they are building new tools dubbed as “CRM 2.0”. With these tools, sales people can access product data and be more responsive. They don’t have to create these linear experiences.

Arpit: So once you have the data and the right tools, you can activate the data. You can build a personalized customer experience. Then you send this data back to your product and personalize the product experience. HubSpot is a good example and there are a bunch of other companies that do this really well — the way people experience the product is personalized based on what they have done earlier in the product.

Arpit: That’s the ultimate destination. You are not just analyzing data. You’re using it to create experience outside and inside your product. Amplitude, which is one of the most popular product analytics tools, has launched a new feature called “amplitude recommend” where you can do this using Amplitude. You can send the data from Amplitude back to your product and to other tools to create a unified customer experience.

Tools for data collection

Alexey: There was a lot of information and I have so many questions. Maybe we can take a step back and start from the beginning. You said for creating this tracking plan, we can use a specialized tool like TrackPlan. We create this tracking plan. Then an engineer would go and implement this for data collection. What are the tools we’d use for that? (33:41)

Arpit: There are CDI tools — customer infrastructure tools. The ones I mentioned — track plan, AVO and Iteratively. They allow you to also collect your data, not just create the plan. Other popular tools will be Segment Connections. Segment Connection is one of the most popular tools for tracking product data. There are also RudderStack and MetaRouter, which is relatively new. Freshpaint enables implicit tracking — you don’t even have to define tracking. Once you install it, it starts tracking all the events automatically. Some companies do this using code and some companies build microservices just for tracking purposes. But, then of course there are all these great tools that I mentioned. In fact I have written about this and I am happy to share content. (34:16)

Alexey: Please. Send the link and I will include it in the description. (35:21)

Arpit: Yeah, for sure. (35:25)

Data warehouses

Alexey: We mentioned that we collect the data and then we can send it immediately to a product analytics tool, but it is better to store it in a warehouse. So what is a warehouse? It’s a database that you own, right? Can you tell us what it is? (35:27)

Arpit: A data warehouse is a database that is purpose-built for analytics. It’s not a typical database. Companies use it to store large amounts of structured data. They create data models in the data warehouse, they transform the data, they clean the data there. There are tools for transformation like DBT or Trifacta. Once you have this clean structured data in the warehouse, you can analyze it in a BI tool. (35:56)

Arpit: Also, there are product analytics companies that are now warehouse-centric. You have a warehouse and you are already sending data there. If you want to implement a product analytics tool, you don’t have to use their SDKs to send data to them directly. You can send data from your warehouse to these products. There is a tool called Rakam. You don’t even need to send data there. It just sits on top of your warehouse like a BI tool and offers data analytics features.

Arpit: So, it’s really important to set up a warehouse. Popular warehouses worth mentioning are Snowflake, BigQuery, AWS Redshift. There is a new one called FireBolt. In fact we have someone from FireBolt in our community.

Alexey: I think I saw somebody recently in DataTalks.Club. (37:21)

Reverse ETL tools

Arpit: Exactly. That's what I meant. Next, there’s Panoply. Once you have the data in the warehouse, you can do a lot of things with it. You can send it back to your BI and analytics tools. You can even send it to your engagement tools. There are again a new bunch of companies that have come up that are solving this problem. They are referred to as “reverse ETL” tools or “operational analytics tools”. Companies like Census, HighTouch, Grouparoo are solving this problem. You have the data in the warehouse and you want to send data to a lot of different tools — your sales, marketing, advertising, support tools or whatever product analytics tools. You can do that using these tools. (37:25)

Customer data platforms

Arpit: There’s a lot of different tools for solving different pieces of the puzzle. To implement all of these tools, you’d need a lot of resources. You’d need a data team, or at least one dedicated data engineer. It’s not possible for early stage startups to do everything. It’s worth mentioning CDPs — customer data platforms. They are an all-in-one bundled solution, where you can track data and send it to different tools. You can create audiences and create your models and segments inside a CDP. Of course, it has limited capabilities. You cannot do everything there that you can do in a warehouse, but it gives a lot of flexibility to marketers and growth professionals to work with data without relying on data teams. (38:20)

Modern data stack for growth

Alexey: I was trying to take a note of all the tools you mentioned. But there are simply so many of them. Let’s say my co-founder and I just started a startup. We understand that data is important. We want to use it. We look at all these tools and there are just too many. How do we make a decision which tool to choose? (39:16)

Arpit: Yeah. It’s first important to define what your goals are. A good way to think about this is to just list down 10 questions that you want to answer with data. Then work backwards and figure out the tools. There are ready-made tools, but a lot of companies end up fixing these problems or implementing these solutions without buying ready-made tools, by building them in-house. So, it depends on what your resources are — these tools can also get expensive. (39:54)

Arpit: At the very least, you need to collect data. You need a tool like a CDI — Segment Connections, Rudderstack, MetaRouter. I will share a list. I’ve written a lot about this stuff. A lot of these tools have free tiers, free plans. You can explore different tools and see what works for you.

Arpit: Once you have the data, you want to analyze it in a product analytics tool or even like a simple BI tool — or both. It makes sense to have both. They both serve different purposes. Of course, you want to have an email tool where you send this data to create personalized emails, to have a great onboarding experience — if you have a SaaS product, you want to do some in-app onboarding. There are tools for that and you can send this data to those tools.

Arpit: There are 4-5 different tools. I like to think of this data stack as the “modern data stack for growth”. If you hear the term “modern data stack”, you hear it in the context of analytics — “modern data stack for analytics” is how I would describe it. You have an ELT tool, like Fivetran, Stitch, Xplenty, etc. You’re ingesting data from all third party tools into a warehouse. Then you have a BI tool and you have a transformation tool. DBT is worth mentioning here — it’s growing so fast, so many companies are adopting this library for their transformation needs and modeling needs. Then BI tools — like Liquid, Looker, Mode, etc. That’s the modern data stack for analytics. (41:30)

Arpit: In terms of modern data stack for growth — you need a data collection tool, a product analytics tool and a warehouse. Every company should have a warehouse.

Arpit: Then you have tools that make data available in your downstream SasS tools — sales, marketing, support tools. That could be a customer data platform — if you are using tools like Segment Connections and Rudderstack, they can do that. If you are storing the data in the warehouse then you can use a reverse ETL tool. Each of these categories obviously has multiple tools. A lot of them are very similar. Some have different capabilities. It’s time consuming to evaluate all of these different tools and understand the differences. That’s actually one of the things I am trying to solve with dataled.academy.

Arpit: I am launching “company profiles”. You can go and learn about a product in a very simple manner, understand what the product does, what the core benefits are, who the product caters to and then get answers to questions about the product. I agree, companies spend a lot of time figuring out the right tools. That’s a problem that needs to be solved.

Buy vs build

Alexey: I have engineering background. I look at these tools and I think “why are they so expensive?” I could implement something like this. But I know that once I implement it, there will be bugs. It’s difficult to maintain later. You probably had this experience: you go to a company, you say “this is a great tool” and an engineer says “No. I am going to implement it myself”. (43:50)

Arpit: I have experienced that a lot. Very few people actually like using these tools. For most people it’s a headache. It’s an additional task whether for engineers to implement the tool, or for business teams to use these tools. Everyone just wants answers to questions. Especially when it comes to data. They just want data to be available in the right format so that they can use it and derive insights from it or act upon it. Implementing new tools is not easy at all, especially when it comes to data tools. There are a lot of security challenges, a lot of compliance issues. It’s difficult. That’s why it’s important to understand the problem that the tool solves. (44:24)

Arpit: If you have the resources then you can do the buy-vs-build analysis. If you build this in-house, how much will it cost us? Do we have the resources to maintain that? Versus buying a ready-made solution.

Arpit: A lot of these tools are open source. So, Rudderstack is open source, Grouparoo with a reverse ETL tool is open source. There’s a bunch of open source BI tools. There’s an open source product analytics tool now called “Posthog”. In every category you will find open source tools. If you have the resources — it takes a lot of effort to implement an open source tool. Just because it’s open-source, it doesn't mean it’s easy to implement. But if you have the resources you can go that route.

People we need to in the data flow

Alexey: We talked about tools. We discussed the flow from the moment we start capturing events to the moment when data is activated. I’ve heard that we need to have at least a data engineer — somebody who implements this. Who else do we need to have in a team to implement the whole flow from the beginning to the end? (46:13)

Arpit: For early startups, you may not have a dedicated data engineer. Any engineer — a backend engineer or frontend engineer could help you. But eventually you’d need a data engineer to manage all of these different data pipelines. It’s not just about implementing the tool once. You have to maintain it. You have to make sure that everything is working properly. Then teams have continuous requests for new events to track, you want to add more data, and you want to collect more data. So, at the very least, you need a data engineer. (46:48)

Arpit: If you are using a BI tool and you have a warehouse, then it makes sense to have a data analyst. The analyst will be analyzing the data and making the data available in the BI tools. So, you need those two. I have seen companies hire one data person who does pretty much all of this stuff. Now there’s an analytics engineer — a more specialized role which fits in between data engineering and data analysis. They work with tools like DBT. They build data models.

Arpit: We also have DataOps people who make sure that all tools are working, all teams have access to the tools that they need and to the resources that they need. There are companies that are creating product ops teams. Companies that don’t have dedicated data teams, they are calling this “product ops”. They work on prototypes and often do a lot of the work that data teams do. They take care of all the tools. They take care of your ETL pipelines, they take care of your warehouse.

Arpit: You have an analyst, an engineer and someone who understands the product really well. it doesn’t matter what you call it, but depending on your resources, you decide how many people you would want in a team like that. It definitely makes sense to empower product teams and growth teams to understand all of the stuff. Then they can support these teams by taking up their work instead of adding more work for data people.

Arpit: One of the typical challenges is that data people are overwhelmed with requests from different teams — teams have requests to track new data or create dashboards. If you enable them to do that work themselves, then you can make your data people more efficient. That’s data democratization — you aren’t just making data accessible for people, but you are enabling them to work with data. You’re implementing self-service analytics tools. People can go in and derive insights from data themselves. You are investing in upskilling people. Have people learn basic SQL, so they can run simple queries and won’t have to rely on data people. They will be able to take a SQL query, put it in the right place, run it and then understand the result.

Arpit: These things are becoming more and more common. It makes sense for every team to learn these skills and become data literate. I strongly believe that one doesn’t need to have an engineering or technical background to work with data.

Alexey: We need to have quite a few roles. But not every company needs all these people. For early-stage startups, they just need somebody like an analytics engineer, who can do both analytics and data engineering. Then as a team grows, they can have a data engineer and a data ops team to support all these tools. (51:06)

Data democratization

Alexey: We also talked about data democratization. As I understood, this is about enabling people to access the data, go and analyze the data themselves, go and implement some things on top of data themselves. Is there any more to that like what is it? (51:40)

Arpit: Data literacy is a big part of it. You cannot democratize data just by making data available or just by giving people access to different tools. Invest in data literacy. Have people from less technical teams or business teams know how data works. Invest in documentation, invest in data cataloging tools or data documentation tools — tools like Atlan, Secoda. (52:07)

Arpit: So, invest in data literacy within an organization and then make data available in the right tools — tools where people can actually use their data, not just look at dashboards. Then, of course, up-skill them, so they can efficiently use product analytics tools to self-serve their analytics needs. Teach them some SQL to go and use a BI tool.

Arpit: To sum it up, data democratization is about investing in data literacy, making clean accurate data available in the different tools across an organization.

Motivating people to document data

Alexey: Also documented. Available and documented. (53:30)

Arpit: Yes. Especially in a remote world, documenting data is one of the most important things. (53:33)

Alexey: How to motivate people to write data documentation? I know it is difficult. As a data scientist, I am the one who’s constantly producing new data for others to consume. But this step of documenting the data is annoying. How do you convince people like me and others to go and document this data? (53:48)

Arpit: It works when you start small and you start early. The earlier that you start, the easier it is going to be. If you keep delaying it, it becomes difficult. And if there is a tool that can solve your problem, you should explore that tool and adopt it. Tools like Atlan — these new age data documentation and data discovery tools — they integrate with all your data sources and they automate documentation and data exploration. If you have a lot of different data sources and you have a lot of data in different places and you don’t have a way to efficiently document it, then that is worth exploring these tools. (54:15)

Arpit: But before you invest in a data documentation cataloguing tool, you need to have the right data infrastructure in place. It comes after you have data collection, warehousing, analysis, activation. All of that should be in place when you invest in these tools.

Arpit: If you are starting today, then it should be table stakes that everything that is being tracked should be documented. That’s how you go about it. If you don’t start early, then and it’s going to be a mess later on.

Alexey: So, it ‘s more about the culture and mindset? You just say, “We need it. We start early. For all the data we produce, we must have documentation. Because if we don’t, then it becomes a mess”. Right? (55:45)

Arpit: Yeah. It’s important to start early and it’s definitely a culture thing. Thanks for mentioning that. (55:59)

Product-led vs data-led

Alexey: One thing I wanted to ask you at the very beginning, but we didn’t cover this yes. There’s a thing called “product-led”. What is product-led? What is the difference between being product-led and being data-led? (56:08)

Arpit: Being product-led requires you to be data-led. The whole idea of being product-led is using your product to drive growth rather than investing in sales. Although product-led and sales can coexist, one of the principles of product-led is that users or prospects or customers should be able to try your product before they buy the product. You want to have a free trial or a free plan. People should be able to use the product and derive value from it before they are asked to buy it. (56:27)

Arpit: This takes us back to the earlier conversation. I mentioned that the salesperson can see that the prospect has actually used the product and derived some value. Or, reached the “aha moment” of the product. I often refer to it as the “activation” event. In a project management tool, it could be creating a project, adding one user and creating three tasks. That’s when you feel that the user has derived some value from the product.

Arpit: Typically it applies to organizations, not individuals. In a product-led company you invest in these product-led efforts, for example a self-serve onboarding experience. For that, you need data. If you don’t have data, then you can’t enable prospects to have that personalized experience when they are starting to use your product. A common example is in in-app walk-throughs or onboarding walk-throughs that you see when you start using a new product. You can personalize it based on the role of the user or the industry of the user. Then you keep personalizing it based on the features they are using in the product. You can do it only if you are data-led, if you have the data. Using the data to build onboarding experiences or triggering emails is a great example.

Arpit: You might have received an email from a company saying “you should try using this feature” after you have already used that feature. That’s very common. That’s a bad experience. You have actually opened that email. Someone who created that campaign will be happy because they see a good open rate. But it actually annoyed you. Their emails are useless. Next day, I am not going to open their emails because they are telling me to do things I have already done.

Arpit: If you’re data-led, you don’t do that. You make sure that if someone has used a feature, they will not be asked to use that feature again.

Alexey: I can attempt to summarize what you said. Being product-led applies to a company not to an individual. It’s about taking feedback from users, being led by this feedback, being led by what users want. To be able to do this, you cannot call every user. You need to track the data, you need to have this data to make, if I can say that, data driven decisions. (59:26)

Arpit: Data-informed decisions. Feedback is an important part of it. Gathering feedback while people are using the product, running micro surveys in the product or even gathering data by looking at heatmaps and session recordings… All of that stuff drives towards a product-led approach. (1:00:05)

Wrapping up

Alexey: Thank you. We should be wrapping up. I wanted to ask you if you want to say any last comments? Anything you want to mention. (1:00:29)

Arpit: Reach out to me in the DataTalk’s slack community if you have any questions. I’d love to answer your questions. You can check out dataled.academy and you can sign up to my newsletter. It’s not a weekly newsletter, I send out almost every two weeks where I share different lessons. You can check out the past issues on the website, where a lot of the stuff that I talked about is already written, that will be helpful for you. Thanks again for joining and listening to us. (1:00:40)

Alexey: Thank you. I will add to what you just said that you should also go and check the podcast. How many episodes do you have already? (1:01:13)

Arpit: We have seven episodes published already. If you are from a data company, I’d love to chat with you. If you want to be on the podcast, feel free to reach out to me. Seven episodes are live here. (1:01:24)

Alexey: It means that you recorded some which are not published yet? (1:01:45)

Arpit: Yes. (1:01:49)

Alexey: Can you tell how many there are already? (1:01:50)

Arpit: We have three more already in the tube which are ready to go. (1:01:53)

Alexey: Okay, stay tuned. (1:01:58)

Arpit: Absolutely. Thanks again for having me, it was great. (1:01:59)

Alexey: It was great, indeed, a lot of information. I took so many notes. With these tools I will reach out to you later today to get the links from you. Then I will put them in the description. Thanks everyone for joining us today for listening. I hope you enjoyed it. We didn’t get any questions but there were a lot of questions from me. I kept you entertained. (1:02:03)

Arpit: Yeah, it was great. I always like chatting about this stuff. Hopefully we will keep the conversation going in on Slack. (1:02:33)

Alexey: Definitely. Thanks again! (1:02:42)