MLOps Zoomcamp: Free MLOps course. Register here!

DataTalks.Club

Conquering the Last Mile in Data

Season 5, episode 8 of the DataTalks.Club podcast with Caitlin Moorman

Did you like this episode? Check other episodes of the podcast, and register for new events.

Transcript

Alexey: This week, we'll talk about the “last mile of data” and we have a special guest today, Caitlin. Caitlin is the VP of data and business operations at Trove Recommerce, where she helps brands buy back and resell their products at scale. Previously, she led data teams in crowdfunding and in self-publishing companies. She's also an admin and a co-founder of an amazing Slack community called ‘Locally Optimistic’. Actually, there is a funny story. When I just started DataTalks.Club, one of the first members, Arpit — he also was a guest on this podcast — asked me “Hey, why did you create this community, if there is already Locally Optimistic?” I replied “Locally what?” Then he invited me to Locally Optimistic, and I found out about this Slack community. So, if you're into analytics and data things in general, do check it out. But who knows what would have happened if I knew about Locally Optimistic? Maybe we wouldn't be talking now. (3:07)

Caitlin: Yeah, but I think the fact that there are multiple organic emergences of these communities really just speaks to how much data practitioners really need that community. We're still figuring out so much. I find the community to be so helpful. (4:13)

Alexey: Welcome. (4:36)

Caitlin: Thank you! (4:38)

Caitlin’s background

Alexey: Before we go into our main topic of ‘conquering the last mile of data’ and what it actually means, let's start with your background. Can you tell us in a few words about your career journey so far? (4:40)

Caitlin: Yeah. I started my career working for a small, very involved private equity firm, which was a really good six-year mash-up of financial modeling skills, like investment banking plus bouncing from project to project, the way you might in consulting. I did everything from evaluating investments, designing incentive compensation plans, serving as a temporary General Manager for one of our companies. I spent a lot of time in that role making decisions based on data, but I actually didn’t even know where it came from. I would just email someone, describe what I need, and then get a CSV back. So, it was very blackbox. I was instead really focused on how to use that data to make decisions and really spend a lot of my time analyzing it, framing trends, really understanding “where we should go from here” and also the typical spending hours creating the ‘perfect chart in PowerPoint’. Ultimately, for a lot of different reasons, I decided to leave private equity and settle into a single company. (4:52)

Caitlin: To be honest, I wasn't super thoughtful about it, but I was really lucky and I ended up in an analyst role at a self-publishing company. As a data team of two, I thought the role was going to be more like FPNA – in my wheelhouse of using data but not creating it. Very quickly, it became a lot more technical. I went from very nervously changing ‘where’ clauses to trying to writing PHP for home-coded daily sales emails – it escalated quickly. That was a really amazing experience. It was around the same time Redshift was emerging. We were by no means on the cutting edge. Everything was kind of home-baked and we didn't have any of the user-friendly tools that we have today. From there, I moved to the Bay Area, started working for a company that was in the middle of the transition to a much more modern data stack. That began my love affair with modern data tooling, enabling teams.

Caitlin: I ended up leading a team at that barrier startup, ultimately leaving to build out a team from scratch, which has been a really fun experience. I still really love the technical side. It makes me really happy to just disappear for a few days and go write some code. But especially as my role has grown, I am keenly aware of that moment where data actually changes decisions and I'm super focused on figuring out how we can get more effective at creating those moments – making sure the data is there, making sure that the right people are in the room, and to some extent, actually thinking about whether the right answer is that one very well crafted PowerPoint slide. Sometimes it is. Those challenges are something I am spending a ton of my time thinking about lately.

Alexey: Do you still have to use PHP? (7:54)

Caitlin: No… no. I don't know whether I ever was actually successful when I did need to use it. But I spent a lot of my time beating my head against it. (7:57)

Alexey: You were saying that you're still doing a bit of hands-on work? Sometimes a bit of coding, right? (8:09)

Caitlin: Yeah, much less so very recently, but I do still like to get in there and dig around. I wish that I had more time for it now, but that's the reality of more strategic roles. Someday, I think that cycle of my career is likely going to be continuously going from creating a team from scratch to growing it, and then realizing that I'm too far away from it and coming back. I haven't reached that point yet. But someday, I'm going to boomerang back. (8:17)

The last mile in data

Alexey: You said that you're really interested in seeing and understanding how data can be used to change decisions. This is what you're focusing on right now. The topic today is ‘conquering the last mile’ and I think these two things are related. So I wanted to ask you – it’s maybe a bit of a story – when I reached out to you and invited you to this podcast to have an interview, you wrote to me that you've been thinking a lot about the last mile in data. I thought, “OK, so what is the last mile?” and then I started to look it up. I googled it, and then I checked. So what I want to ask you, what is the last mile and where does this analogy come from? Why do we use this when we talk about data? (8:48)

Caitlin: The ‘last mile’ is a term that is colloquially used to refer to the last stage of a process, whatever that is. It originally comes from delivering physical goods or services to their final customers. Getting a physical product into a store or a warehouse is a scale problem. So it's relatively straightforward to design and implement solutions for problems where you're dealing with a lot of things moving all at once. But then getting that product from the warehouse into your house – getting that pint of ice cream from the grocery store to me in under an hour when I order from Instacart. That is the last mile. That's where there's a massive amount of complexity, when you think about it operationally. (9:43)

Caitlin: In these classic ‘last mile’ challenges, often half or more of the cost of getting a product to you is really that last mile. It's something where if you solve the big problems, it feels like you're most of the way there, but really, there's still a lot ahead of you. But if you don't solve that last mile, you never get the value out of the thing that you're building. I started in data long enough ago that everything before the last mile used to be really hard. It was really challenging to actually implement tools. It was really challenging to make changes to ETL. All of this was really difficult. When you‘re in smaller companies, you might not ever actually implement a lot of this stuff. I was basically writing queries against a copy of the production database. There was no data warehouse – there were no transformations – so it was writing raw SQL queries every time. That was kind of the way that you would operate.

Caitlin: If you think about this as ‘the last mile’ analogy, the era of that kind of data work was really like when you had to build the railroads. It used to be really hard to get from the center of one city to the center of another city. You build some railroads and suddenly, that becomes much easier. As our tooling has gotten easier in data, it has become much simpler to just set up a pipeline that gets all your data into one place. You clean it up maybe with DBT, or whatever transformation tool you use. You've got a beautiful warehouse that is pretty easy to use. That really opened up the world of like, “What can a data team do?” And it made the challenges seem much more surmountable. We have this general theme in a lot of analytics communities that if you can empower a great analyst, if you solve value delivery – you can get delivery to it. Yes, it has changed – the amount that one smart person can accomplish is crazy compared to what it used to be.

Alexey: Because of the Modern data stack, like Redshift and other things that we couldn't do before. You mentioned that a few years back, you didn't have this, so you had to do a lot of things without using these tools – just trying things out. Now, you're saying it’s like the railroads. These modern data stacks are the railroads. Right? (13:00)

Caitlin: Or like the interstate highways, whatever the primary form of industrial transportation is in your particular locality. I'm in the US, it's all interstates here. But in many places, it is railroads that are much more efficient and less environmentally damaging. We'll say that it's a really great rail system – something that gets stuff to the warehouse, gets it to the middle of the city. But you still see that in every organization, people are really frustrated that data isn't available, or that the data team’s work doesn't feel like it's impacting the business, and analysts are feeling like they're doing all this work that doesn't seem to be valued. That's where we're seeing the pain of the last mile. So we're getting the data most of the way there, but we're still not really delivering. (13:24)

Caitlin: When you think about data problems, you can kind of separate out these scale problems. “How do you get it in the warehouse? How do you transform it? How do you get to the most basic dashboard and get clearly defined metrics to a user?” Then there's the last mile of that, which is “How do you actually get a team to change what they're doing based on the data and enable them to make better decisions based on the data?” Much like the last mile of delivery is all about how many different houses there are and navigating very different terrain because this one is uphill and this one is deep in the woods – it's very similar because you just have so many different stakeholders and so many different ways of making decisions. The effort to understand that landscape and actually get plugged into it is really substantial. But if you aren't getting the data to the decision, then your team just isn't having the impact that you want them to have.

Alexey: This ‘last mile’, does it have anything to do with marathons? Because this is what I found – when I was Googling a few minutes ago. I was looking it up and trying to understand what ‘the last mile’ actually is, and I found an analogy with marathons. The last mile, when you run a marathon, is the most difficult one. You're tired, but it's already pretty close to the end. You really have to force yourself to actually run this last mile. Have you heard anything like that? (15:20)

Caitlin: I've never thought of that as the source of this analogy. But I think it has a lot of parallels. You get to the point where you're like, “Oh, well, I've done most of the work. I ran 25 miles. That’s pretty close, right?” Getting things across the line and actually finishing them can be really challenging. It can feel much easier to start on the first mile of the next problem than it is to tie everything up with a neat little bow and make sure that people understand how to use your products. Getting to the point that people are really understanding the data and that they understand how to actually bring these two things together at the time of decision – to learn from the data that you've provided. (15:56)

The Pareto Principle

Alexey: Yeah, there's also the ‘Pareto principle,’ also known as the ‘80-20 Principle’, where from 20% of the work, you get 80% of the results. Or the other way around – the remaining 20% takes 80% of the effort. Would you say the last mile is this remaining 20%? (16:45)

Caitlin: I think that potentially, it's similar. But I think it's really challenging to get value out of data at all, especially if you don't really understand how to connect it to decisions. That might look really different in a lot of organizations. There are a lot of organizations where users are really savvy and they understand the data – really all they need is access to it. Then you create a really solid data set or a really useful dashboard, and the people are good and they're going to use the data. They're going to get it into the meetings. They're going to make decisions based on it. That is really kind of all you need to do. But then there are organizations where, for whatever reason, there are incentives to not look at the data, or just a lack of comfort with it, or a lack of fully understanding it – there are lots of reasons for why people don't take that last leap. There, it's not really helpful at all to put the data out there if people aren't going to use it. (17:07)

Caitlin: I tend to think of 80-20 as more figuring out how to tackle particular problems. For example, if what you're trying to do is optimize marketing spend, you're gonna get 80% of the value out of 20% of the effort in the sense that you can answer all kinds of questions – you can dig really, really, really, really deeply. What you need to do is find those high leverage questions, where once you answer them, you get the value. But even within that, you still have to make sure that the stakeholders – the operators who are really taking action – fully understand how to use it and that you understand how their decision-making process works. You need to enable the data to be in the room at the right time, whether that's by a data person being in the room, or making sure that the team really understands the tools they have. Or… there are lots of different ways that this might take shape. But ultimately, if you're not at that point of decision, or not really well plugged in to how operators are using the data, then you can't even get that first 80% of the value.

Alexey: Okay. So this is binary. You do all the work, and then there is the last mile. Similar to a marathon, if you don't run the last mile, you haven't finished the marathon. So you need to make sure that people – the decision-makers – use your dashboards to make decisions or use your machine learning models to affect the customer, or whatever data product you have. You need to make sure that decisions are made based on this or else all the effort is in vain. Is that right? (19:29)

Caitlin: Yeah. (20:00)

Failing to use data

Alexey: When I was preparing for this, I read a few articles. What we are talking about here is more like “We have some data, but we're failing to use it. This is the last mile problem that we need to solve.” So, “We did all this work, how can we now use this?” The article said that fundamentally, failing to use data isn't a technological problem, but a social problem. I think you mentioned something like that – right now we have all these modern data spec tools that make it easier for us. We have these railroads that connect to cities. Now, technologically, it's easier. This article is saying all that and that it's a social problem. So why do you think that's the case? (20:02)

Caitlin: I think your data products are fundamentally products. So if you want someone to use a product, what they get out of it – their benefit – has to be greater than what it costs them. This means how hard it is to use, whether that's monetary costs, time costs – any of that. There can be a lot of different factors that contribute to that equation being off. Either the benefit is too low or the cost is too high. Most of those sources of those issues are really social problems. It's about how people think about this or how they use it, and not whether the data is available. (20:55)

Caitlin: There are two ways to make that work out. You either have to make the benefit bigger or make the cost smaller. I think of the major driver of the benefits of good data-driven decision-making as being cultural. You have to have a culture of measuring people's results and rewarding them. Your better decision has to matter. If you're in a situation where your budget next year is going to be based on how big your budget was this year and how much of it you spent, then you should just spend your budget, period. What you're actually doing with it doesn't matter that much. If your manager just gives you a list of things to do, and you're rewarded by just doing them, then just do them. Ship the feature from the campaign. Check the box. On the other hand, if you have a really clear target that's driven by metrics – you're really focused on improving conversion, acquiring new users, etc. – you start to actually care about which activities have the highest leverage and (to get back to the Pareto principle) how to drive 80% of the results and 20% of the effort. The only way you can understand that is if you start to really dig into the data and understand how your various campaigns are performing and how various parts of the conversion funnel are behaving. If you don't have those incentives, then there's not a lot of benefit from using data and “Why bother?”

Caitlin: On the other side of the equation, you have to keep the costs low. So you have to know how to find the data. You have to know how to use the tools. You have to know how to interpret the data. You have to have trust in it and not constantly be concerned about data discrepancies. “Is this real?” “Is this true?” Hopefully, you don't rely on an analyst for every question you ask. Because all of that just adds cost to the process. By getting the balance right, you then get to a place where people use the data and bring it into the decision-making process. They're really using it for prioritization. Maybe this is also one path to building a culture of experimentation, where people really want to test things and understand how they performed, so that they can make better choices next time. All of that healthy data culture comes from the incentives, the skills and the training, both of which are really people problems.

Alexey: So, we need to have a healthy data culture. I think you mentioned that to have this, data must be discoverable – people know how to find it. It must also be interpretable – people need to know how to interpret the data. Finally, people need to be able to trust the data. Because if they see that something is off, they will say, “Okay, I don't want to base my decisions on this dashboard. It would be better if I base my decisions on my gut feeling because I don't trust this dashboard.” So you have to have all this. You also said that everything should be measurable and people should be able to see the impact of their work as a number. When we have that, then our data products make sense. Then we can use them and show people that it's actually better to use our data product to make decisions. ”Look, if you do this, your numbers improve.” This is not a technological problem. Is that right? (24:13)

Caitlin: Yeah, absolutely. You have to have the baseline. Obviously, the technical side of it is table stakes but this is where you get into the last mile. The last mile is all the ‘people part’. It's making sure that people know how the incentives are aligned correctly for it. To the extent necessary, you might have to actually sit in the room with them and help them understand “How do I understand this campaign data? How do I tell which ones are performing? How do I tell what happens if I put more money behind the same campaign? Am I getting the same impact from the next dollar as I did from the last dollar?” There are real questions people don't understand and the barriers are often not because the data is not available. It's often hand-holding, training, and helping them to understand. (25:24)

Making sure data is used

Alexey: Is there any other way? Let's say we have everything measurable. We have an analyst who can sit with the decision-maker and explain everything. Is this enough to make sure that the data is actually used? (26:21)

Caitlin: I think it depends. It takes a lot of work to understand why the data isn't being used. If the data doesn’t exist, then it's not a lot of work to understand what the path is to fix this. You know that “I have to bring it into the data warehouse. I have to clean it up and make it useful. Create some reporting on top of it. And voila!” Once you've got that, if it's not getting used, there are lots of different things that might be going wrong. You really have to spend some time understanding what those barriers are and what those look like. (26:36)

Caitlin: In a lot of ways, it looks more like user research. You build a product, you put it out there, and people aren't using it the way that you thought they would. So, what's the barrier? “Do they know that it exists? Do they know how to use it? Does it solve the problem they actually have?” Really interrogating and understanding where the gaps are is the key to being able to fix those gaps. Obviously, if people just don't know about it, that's relatively easy to solve. If it fundamentally doesn't answer the question they're asking, then that can be a little more challenging to solve and it really requires another round of work and understanding “What are you really trying to do with this data?”

Alexey: If it's difficult to use, you either solve it by educating or simplifying it? (28:04)

Caitlin: Yeah. Hopefully a little bit of both. It’s all about creating the right balance of what's possible in your tools, how much you think people are really going to learn and work around. So if the problem is just not knowing how it works, then the solution is just teaching them. If it's really hard, then you might have to really think about “How much can I simplify this? Do I need to take a different approach? Is there a totally different way to get to the same result that would be more user-friendly?” (28:10)

Communicating with decision makers

Alexey: I'm thinking about an example we have at OLX. We do a lot of experiments, usually A/B tests. Let's say we have some traffic of users coming in – for some users, we show one variant, and for some other users, we show a different variant. Then we compare different metrics and see if the new feature gets uplift in some metric. Standard A/B tests. Then I think the problem we had at some point was that the people who are looking at this, we were showing them too much statistical stuff like, P values, test power – all these statistical things. They were just overwhelmed. “What does this all mean? All these ‘confidence in development’ or this ‘P value?’” It required a lot of iterations to first teach them, and then to think, “What can we not show them? Do we really need to show all that to them? Do they really need to know about the P values and what these P values mean?” Or maybe we can just show them “Okay, this is significant.” And that's it. I guess this is quite in line to what you're saying, right? (28:42)

Caitlin: It’s very similar. I've been in similar A/B testing situations before and even within your users, I bet there's a variation of how much people are willing to learn or are interested in learning. It's all about figuring out ”Who are you optimizing for? How do you trade-off between simplicity and functionality?” We did the same thing and for us, that looked like setting a default P value for significance, but also allowing people to change it if that was something they were comfortable with and understood all that. So the experiment was communicated in terms of “Significant. Not significant. Here's how you interpret it.” (30:03)

Caitlin: But we had enough toggles for people – a couple of power users – to go in and say, “Actually, in this situation, I'm good with this level versus this level. This is how we want to measure this test in particular.” But, it's really hard to get the right balance and to make it feel really useful to people. When you have such a deep knowledge of the data and the space, you can often think, “Oh. Well, I can look at this and the depth of understanding I get is really rich.” But often business users just really want a simple, super easy-to-interpret result.

Alexey: If you let data scientists build the tool, then they will show all these things like “This is the Mann-Whitney U test. This is the power of the test. This is the P value. This is the confidence interval.” All these technical things. If you talk about machine learning, then “This was the accuracy, precision, recall, ROC curve, AUC.” Things like this. If you show this to the business, and then they say, “Okay, what is that? How much money is this thing actually going to make? Why are you showing me all that?” I guess this is what ultimately matters to them. So, bridging this gap is not a technological problem, it's more like a social problem. Like “How do you actually communicate this?” (31:35)

Caitlin: It's a lot like building any technical product. I think a lot about Zapier as a really good example of this. Zapier exists to take something that is technical and make it non-technical – to allow people to leverage APIs to accomplish automation without knowing what an API is. It's really hard to get the right level of abstraction when you start to talk about something like that. It's really difficult to build a thing that's going to let someone who's comfortable parsing out text and doing a bunch of interim steps, versus the person who just like, “When I get an email, can you just send me a Slack also? I just want it to be exactly the same.” (32:25)

Caitlin: It really is a lot like product design. There's the challenge that your data teams are really small compared to your product team, most likely. You have to learn to find the right points of leverage, find enablers, train power users and have them train their teams. You have to find all of these ways to scale the work that you have to do. Because you don't have a whole team dedicated to doing a lot of research and spending a ton of time on design. It's a scaled down version but ultimately kind of the same problem.

Working backwards from the last mile

Alexey: You wouldn't believe it, but I actually read another article. So I think I read 3 in total. I think I talked to you about the first one, which was about comparing the last mile to a marathon. The second one was talking about it being a social problem VS technological problem. Then the third one said that you should prioritize the last mile of the analytical journey and work backwards. This is probably quite a long sentence. There are many things to unpack here. So they say that they prioritize the last mile and work backwards. Do you know what they mean here? Why should we prioritize it and how do we work backwards from that? (34:00)

Caitlin: I interpret that as really focusing first on what success looks like for the thing that you're building. For different types of projects, that's going to be very different. For your A/B test results dashboard or tool, what you want is for a product manager to be able to make the right decision on whether to roll out a feature or not. There are other goals around that. You want to make sure that they wait long enough to get meaningful results. You want to make sure that they are running an experiment in a responsible way – there are some subsidiary goals there. (34:42)

Caitlin: But what you're really focusing on is, “I want a project manager to be able to come here, look at their experiment, and understand whether it worked or potentially what the business impact was.” Maybe you actually need to convert it into dollars. Focusing there helps you really understand what you're going to need to build. If you need to communicate it in dollars, then you need to start thinking from the beginning of how you're going to build toward that. Instead of immediately thinking, “I need an A/B testing dashboard – I need to start thinking about what our event data looks like.” You're thinking about the user first. That's going to change how you think about the data sources, the transformation jobs, what else you need to join in. How you want to build the dashboard – you're really starting with that decision that you're trying to drive.

Alexey: I think at Amazon they have this ‘working backwards’ principle. I’m not sure if it's at Amazon or somewhere else, but I heard this concept before. Let's say you're working on some product. It can be a feature – it doesn't even have to be a data product. The first thing you do is you write an announcement – you write a blog post that you would publish when the feature is done. You write a couple of pages and once you have that, you then work backwards from this announcement. So you work as if this feature already existed – as if you already did it. Then you think, “Okay, what do I need to do to actually build this feature?” (36:27)

Alexey: What you’re saying, as I understood it, is to think of the end user, right? In the case of a data product, such as an A/B testing system, it could be a product manager who will need to make a decision based on what you show them. So they will need to understand if the feature we're testing is making an impact and what this impact is, “How much better is this thing than the current thing?” We think about this, and I guess we can start involving the user – the product manager – immediately. Before even building this thing. So you ask questions like “What kind of things do you need? What kind of problems do you have? How can we solve your problem?”

Alexey: If we involve them from the beginning, it makes it easier for us to build this thing, because we're already thinking about the end user. So the last mile here, as you said, is making sure that the data is used – or that the product is used. If we involve the end user, if we think about the end user from the very beginning, then it makes it easier. Did I interpret this correctly?

Understanding how data drives decisions

Caitlin: Yeah, I think that's exactly right. It's talking to the users – we were talking a lot about how data drives a decision. It's literally sitting in the meetings where those decisions are made right now and understanding what that process looks like. Maybe, for this A/B testing example, the decision could consistently look like “I have to share the results of my test against my intended impact metric and then I also have to share these other two to make sure that we're not adversely affecting something else too much.” That's the kind of insight where, if you understand that this is what the product team really needs to make the right decision, you build from that as well. (38:15)

Caitlin: It’s about sitting in the meetings where these decisions are being made and talking to the people who are making them. I love pen and paper. I spent a lot of time sketching things out and saying things like, “Okay. If it looks like this, what does that tell you? What's missing? What does the deck you're building look like? What does the deliverable you're creating look like? Does this dashboard get you there? Does this tool get you there?” So, it’s really getting hands-on. I love writing the press release first. It's very similar, like, “What do people say about the thing that you built?”

Sketching and prototyping

Alexey: This sketching – it’s a lot like prototyping, basically, “How would this look at the end?” Then you show it to the decision maker – whoever the end user is – just using a piece of paper. “Does this look like what you have in your head?” Then they say, “Okay. You know, it actually looks completely different.” So, you start talking and they say, “OK. I want this thing here and this thing is not what you understood initially, but it's a different one.” Then you start discussing this. Right? (39:32)

Caitlin: Yeah. That conversation is easier to have if they don't think that you put a lot of time and effort into it. Not that you didn't put enough thought into it, but “I just sketched this out really quick. So feel free to speak up if it doesn't speak to you.” Versus “I created this really robust prototype in Figma. It's really beautiful, and you're going to be calling my baby ugly if you tell me that this thing doesn't work for you.” It's a lot easier to get real feedback if the bar is relatively low. (40:05)

Alexey: Maybe these days it’s not so easy, but also using a whiteboard. Get around the whiteboard and start drawing there. Then we get a lot of feedback, right? (40:36)

Caitlin: Yeah. (40:52)

Showing the benefits of power data

Alexey: We have a few questions, so maybe we can start covering these questions? So, a question from Aideen. I hope I pronounced your name correctly. “When data challenges the traditional decision system, how can we show the benefits of power data? Do you have some experience with this issue?” (40:53)

Caitlin: Again, this comes back to making sure the right incentives exist in the organization. But ideally, you want people to be really incented by good results. Then what you need to do is actually show better results. So you're thinking about the marketing team and helping them make better choices. Well, the results of that are, “We spent the same amount this month as last month, but we acquired 30% more users.” Usually, the results are not that clear. Obviously, if you have a culture of A/B testing and the tools for A/B testing, that's an amazing place to start – to be really confident in your results. But a lot of times, this is really more anecdotal. You might not have super robust ways to report on results. But creating those moments of comparison are still really helpful. (41:18)

Measurability

Alexey: Would you say it's a must that we have everything measurable? Or can we already start convincing people to use our data product when not everything is measurable yet? (42:18)

Caitlin: You're never really at the point where everything is measurable, right? So being as close as you can get is what you need. For example, I work in an environment where we've got a warehouse, through which some activities are basically invisible in our data – they're fully manual. You can't effectively measure how long a process took or anything else. But a lot of times, when we make changes to process or make changes to our tools, we literally just spent a couple of hours doing a time study. Someone sits with a stopwatch and times people and says “Did this take more or less time?” Is that precise? No. Is that a real experiment? No. But it's better than nothing. And if you are talking about sufficiently large changes, then it's compelling. You can tell, ‘’I cut this time in half. That must be real.” Versus… it's indistinguishable. We should go with the process that's more scalable or the process that's better for some other reason, and start to make decisions from there. (42:29)

Alexey: Yeah, that's interesting. I was also thinking about, if you work in ecommerce – at least if we're talking about a website only – then all these clicks are relatively easy to measure. I mean capture, track them and put them somewhere in your data warehouse and then have all these dashboards. But if we’re talking about some manufacturing line somewhere, or a warehouse where the actual people move things – not robots, but actual people – then it becomes tricky. You cannot put trackers on people, and watch how they move because A: they will not like it and B: it's probably not cheap. Right? (43:39)

Caitlin: It can be really hard, but I really love a good proxy metric. “What's the closest we can get to measuring this thing?” I think anything related to employees is a great example of this. We're never gonna have a large enough sample of employees that we're running A/B tests on employee engagement, but we're gonna throw a survey out there and see how people feel. Ultimately, you kind of make your best decision. But, in most businesses, there are a ton of parts of the business that are really easy to measure. I think that's a really good place to start for these cultural changes. You want people to use data, start with the data you have. Then you'll get to a point where you're starting to talk about “How do we optimize these less visible parts of the process?” Then, hopefully, you've got the trust from everybody, and you've got enough culture of data-oriented thinking that you can start to find ways to feel good about those areas as well. (44:30)

Driving change in data

Alexey: Marketing is probably a good start in many cases. Because you basically have some sort of web page and you can play with different wording there or different positional things. Even with that, you can already start measuring and then show that you can measure this. Then people see that it is useful to have things like this, and then you start using this as a convincing argument. People start believing you and then you take care of more complex things. Right? (45:35)

Caitlin: Yeah. I think Emilie Schario does a really good job talking about, when you're trying to drive change with data, how to focus on as narrow a slice as possible. I'll have to dig this up, because I‘m not sure where she wrote this. But I'm going to credit it to her. As you think about how much you can scope down your work, you want to really focus on, “I want to enable this salesperson to make this better decision based on the data. And so, I'm going to focus entirely on that until that end is accomplished.” That means all the infrastructure work that’s necessary, all of the transformation work – whatever it takes to get there. (46:11)

Caitlin: But when you narrow in, then you've got a really clear success story. You've got an advocate in that stakeholder. You've got everything that you need to start to build the case for a bigger role for data in general. You move on to the next team and you say, “Okay, how can I help marketing address this decision? How can I help product address this decision?”

Alexey: And I didn’t hear the last name, Emily... (47:23)

Caitlin: Emilie Schario, I'll add a link. (47:27)

Asking high-leverage questions

Alexey: Okay. I'll put this in the description. We also have a question from Kurt, who is asking, “You have emphasized asking high leverage questions. Do you have any tips on finding these points as both an analyst and an executive?” (47:30)

Caitlin: Part of this is having enough bandwidth for analysts to do a little digging and understanding this themselves. If the business isn't looking at data, then you probably don't know the highest points of leverage off the top of your head. But if you spend a little time in the data, I think you'll start to understand that. Often, the best place to start is actually not with the data in your data warehouse, but with your financials. (47:52)

Caitlin: So it’s sitting down with someone from your accounting and finance team to understand, “What does our performance look like? What's our biggest cost center? Where are we spending money?” Wherever you're spending money is a really good place to bring the data to understand how you can do that more effectively and more efficiently. Either spend less or get more for what you're spending. That has consistently been a good approach for me to identify those points of leverage.

Caitlin: You don't have to solve the biggest problems first. Sometimes the biggest problem is really hard. But you have to solve a big enough problem that people care about it. So you have to find that sweet spot between “We spend X dollars a month on marketing, so I want to focus on improving the efficiency of our marketing spend, even though we spend 10x on our warehouse employees. But we don't know what they're doing, so I'm not ready to tackle that problem yet.” [laughs]

Resistance from users

Alexey: I also imagine that you can get some resistance. Let's say you're starting with financials, you find the biggest cost center – this is the warehouse. You go to the warehouse manager, and you're saying “Hey, how about using data?” And they respond, “How about no.” [laughs] I imagine that this can happen, right? So what would you do in this case, if there is some resistance from people? If they are not really eager to use the data? (49:25)

Caitlin: I think that there are two separate answers. If I am independently trying to push this project through and push this cultural change, I would not start with that particular warehouse manager. I want my first project to be with someone who has already bought in with me. And I'd rather work on something smaller, or something harder, and have someone who's in the boat rowing in the same direction with me, than try to convince the primary stakeholder that this is a good idea. You want to find someone who will be your advocate, someone who really wants this. In most organizations, you're going to be able to find one person who really wants more data and wants to make decisions with that data. (49:59)

Caitlin: So if it's totally up to me, I just say, “OK, cool. Thank you. I am excited to talk more about this in the future.” And we come back when there's much more of a snowball of a more healthy data culture coming and they're more likely to buy in. If this has to be the first area, then I assume that it’s coming from someone else. If a COO hands this off to you, and says to “Work with this person in the warehouse and figure it out.” Then you have to be a lot more delicate around how to get that person on board and how to convince them of the value of it.

Caitlin: I would say, generally, always focus on upside, not savings. Don't talk about how “We could ultimately need half as many people in the warehouse and that's why we should do this.” Talk to them enough to understand what they're not doing that they wish they could do. Then you can start to talk about it and say, “Well, if we were more efficient in this part of the process, then you would have enough people to do this other thing that you want to do.” Or find ways that show that driving better performance really benefits that person instead of potentially feeling like something's being taken away from them. I think that's usually the most impactful part of getting somebody in the boat.

Caitlin: Just really sell on the benefits and find something that bothers them that you can help with. Start to kind of build that rapport and that trust. Honestly, in data roles, this is almost always manual Excel processes. Find what they do in Excel and find a way to make that better, even if it has nothing to do with the project you're working on. [laughs] Start to make them your advocate. Start to make them appreciate you.

Alexey: I was saying that, if you're looking for low-hanging fruit, this could be the marketing department. I think marketers have realized by now the importance of using data for making decisions. Things like “Which channel is more effective? Where should I put more money?” They will probably be more welcoming to you, your work, and using data in general. They're probably using some data already, but they will be happy to use more of it. Especially in growth marketing, I think. I took a course in growth marketing and I was surprised by how much stats there were – A/B testing and tracking data. It's basically some data analysis plus a marketing sort of position. So, I was surprised by that. (52:45)

Understanding domain experts

Alexey: Okay, we have another question. “What kind of questions do you ask domain experts to understand their domain? And can you recommend some literature on that?” (53:46)

Caitlin: Oh, that's a really big question. I think it depends a lot on the domain and the person that you're talking to. I think coming to any conversation with just a really genuine curiosity can get you a really long way. Framing questions from genuine curiosity can make all the difference when you say something as open-ended as, “So what do you do here?” That can be a really curious question or that could be a really judgmental question. You have to make sure that you're genuinely coming from a place of wanting to understand and wanting to really get a grasp on what they do and what's hard for them and what challenges that the team overall is facing and just building rapport. (53:57)

Caitlin: That will depend a lot on how much you know about the person and the organization and how embedded you are. Sometimes it really just starts with not talking about what they do at all and getting to know them as a person and building a relationship before you start to build a work relationship. This would be just coming from a real, open-minded place of wanting to understand what they do – that’s the biggest part. Ideally, after you start to understand what they do, you could document their job for them and just ask all the questions you would want to know and kind of write the ‘handbook’ to this person's job. Certainly don't frame it that way to them, because that sounds a little bit scary [laughs] and a little bit like maybe you're trying to onboard the next person.

Alexey: So you have to be quite good at understanding people. How you approach them and even in what tone you ask a question. Because if you ask a question like “What do you do here?” it can sound curious or it can sound judgmental. So you really have to be careful. (55:35)

Caitlin: Yeah. I certainly don't want to frame that in a way that intimidates anyone. I've worked with a lot of data teams, and we definitely over-index on introverts. I am a strong introvert. I wouldn't necessarily say that early in my career, I felt super confident about my social skills. I was not a person that someone would say, “Oh, yeah. She has great EQ – very high emotional awareness.” That's something that I've had to build over time. I think if you just genuinely really focus on the curiosity side of it – it'll work out. Maybe I wouldn't worry too much about the missteps but, even for yourself, frame your questions as wanting to understand and not immediately as wanting to make better. I think that makes all the difference in the way that you approach the situation. (55:58)

Alexey: So the recommended literature would be some books about emotional intelligence? (56:50)

Caitlin: Yeah, maybe. Or I think it also depends on the area. If you're working on something super-specific and if you're working with digital marketers, then read a book on digital marketing. If what you're trying to understand more generally is “How do I influence without authority?” Then there are some really good books around how to do that. There's Dale Carnegie's ‘How to Win Friends and Influence People’. It's the classic, but there are others, if that's not your particular cup of tea. You can find books that are more focused on just “How do you build rapport with people? How do you really build those soft skills?” That might make you feel more comfortable as well. (56:57)

Alexey: Yeah, I actually tried reading this book at some point. It was difficult for me. (57:45)

Caitlin: Yeah, I've actually never finished it either. I don't love it. But it is the classic and people who have read it generally speak very highly of it. So… I don't know. (57:51)

Alexey: It's like the book ‘Getting Things Done’. Some people love it. Some people hate it. (58:03)

Caitlin: Yeah. Yeah. (58:08)

Linear projects vs circular projects

Alexey: It's always binary, it’s never in between. Okay. We have the last question. We still have a couple of minutes. So a question from Eileen is “Sometimes data projects can’t give expected results and this is normal. But this is creating trust problems in data projects. What do you think about this? How do you approach it? Would you approach it in such a way that it doesn't create trust problems?” (58:11)

Yeah. I actually wrote two blog posts with Alexis Johnson Gresham. I can share links to those about this exact problem because I think it's a really difficult one. Alexis first shared this phrase with me, which was really a light bulb moment, around linear projects versus circular projects. Even within data, you have both of them. There are linear projects, where you can chart out the next step. You have a high level of certainty that if you do step one, you can do step two. After you do step two, you can do step three. And then there are circular projects, where you don't know what you don't know. And a lot of data projects fall into this category.: =Something like building a data pipeline to bring something into the warehouse. That's probably pretty linear. You know there is an API – it might not have all the data you want in it, but you can look at the docs pretty quickly and understand that. A circular project is one where until you know what's in there, you don't know if you're gonna be able to do it. A lot of data science projects fall into this “Until I test it, I don't know how good my results are going to be.” And a lot of analysis projects are like this because it’s like “I want to answer the question – why was conversion up last month? I have no idea if I actually have the data to answer that and I won't know until I dig into it really substantially.” (58:36)

Yeah. I actually wrote two blog posts with Alexis Johnson Gresham. I can share links to those about this exact problem because I think it's a really difficult one. Alexis first shared this phrase with me, which was really a light bulb moment, around linear projects versus circular projects. Even within data, you have both of them. There are linear projects, where you can chart out the next step. You have a high level of certainty that if you do step one, you can do step two. After you do step two, you can do step three. And then there are circular projects, where you don't know what you don't know. And a lot of data projects fall into this category.: I'll share more about this, but the very high-level overview is first just to set expectations. Acknowledge ahead of time that it is a circular project. You don't know if it's going to be successful. You lose a lot of trust by saying you can definitely do something and then not delivering. But people understand if you say, “I'm not sure if this is possible, I need to dig into it.” Break it down into as small pieces as possible, so that you can quickly make progress and report back and say, “I've gotten this far. Looks good so far. Next stumbling block is this.” Or “I spent two days on it. I think it's gonna be really hard for this, this and this reason. Here are alternatives and things we could do to make it less difficult or more possible.”

Yeah. I actually wrote two blog posts with Alexis Johnson Gresham. I can share links to those about this exact problem because I think it's a really difficult one. Alexis first shared this phrase with me, which was really a light bulb moment, around linear projects versus circular projects. Even within data, you have both of them. There are linear projects, where you can chart out the next step. You have a high level of certainty that if you do step one, you can do step two. After you do step two, you can do step three. And then there are circular projects, where you don't know what you don't know. And a lot of data projects fall into this category.: Really lean into that communication and alternatives. “Here's what we can do.” Rather than “Here's what we can't do.” Or “Here's what we’ve learned.” Hopefully, you then start to also build a culture of “Failure is learning.” and ”Let's talk about it. Let's be really excited that now we know this thing is not possible with the data we have. At the minimum, like we don't ever have to think about that problem until something dramatically changes. We don't have to put resources against this again. It's not on the backlog anymore. We've checked whether it's going to work, and it's not.” So, find some ways to help people celebrate the learning.

Recommendations for data analyst students

Alexey: Do you have a couple more minutes? There is one more question that popped up, and it’s very interesting. So Kurt is asking, “I'm currently a data analyst student. Do you have any recommendations, resources, or habits that helped you achieve success in your career?” (1:01:30)

Caitlin: Getting started in data is really interesting. It's really hard, outside of a data role, to even remotely approximate what it's going to be like. Public data sources are completely different from the data you're going to run into in a company. My biggest advice is just to be really curious and to think a lot about why things matter. The logistics of being a data analyst, you can learn on the job, no problem. You'll learn how to write queries. You'll learn how to approach problems. But building that curiosity and that sense of impact and tying results back to the business is often the hardest part. (1:01:50)

Caitlin: Sometimes that looks like taking more business classes or taking the time to understand the scale of impact that you can get from a data source in whatever project you're working on. If you're in an econometrics class – do all of your analysis and then understand “What would this mean if this were true holistically? If this policy went out, what would the real impact of that be?” And how you think about that versus other options. That skill is, hands down, the most useful skill for a data analyst to have.

Alexey: So it would be more business skills, business acumen, and the understanding of how business works, versus just being really good at SQL and being really good at other things that analysts do technically? (1:03:17)

Caitlin: Yeah. The technical skills are useful, but also are not nearly as hard to teach. Most good analytics leaders know “If I find someone who is really smart and understands the importance of the data, I'm going to be able to teach them SQL.” Whereas the reverse might not always be true. (1:03:30)

Finding Caitlin online

Alexey: Where can people find you? Locally Optimistic? (1:03:53)

Caitlin: Locally Optimistic, yeah. I should really join your Slack as well. I was thinking about that earlier this week. I will join today and I'll be there if there's a particular channel that you tend to chat with people in, but I'm also always Locally Optimistic. Always happy to chat there. (1:03:56)

Alexey: In your blog post, did you draw your pictures yourself? (1:04:12)

Caitlin: No, we have an amazing Illustrator who does all of our blog posts. It’s like my favorite thing about the blog. (1:04:16)

Alexey: Yeah, the illustrations are amazing. You were saying that, at some point, you would just take a piece of paper and start sketching. So I thought maybe it is actually you who create all these illustrations. But, no. (1:04:23)

Caitlin: Sadly, I am not quite that artistically talented. My sketches really encourage people to say “That's not quite right.” [laughs] (1:04:37)

Alexey: I see. [laughs] Okay, thanks a lot. Thanks for joining us today. Thanks for sharing your experience. Thanks for answering questions. And also thank you, everyone, for joining us and for asking questions. Yeah –thank you, Caitlin. (1:04:46)

Caitlin: Yeah, thank you so much. This has been awesome. (1:04:58)

Alexey: Yes, thanks. Have a great rest of your day and have a good weekend. (1:05:01)

Caitlin: You too. (1:05:05)

Subscribe to our weekly newsletter and join our Slack.
We'll keep you informed about our events, articles, courses, and everything else happening in the Club.


DataTalks.Club. Hosted on GitHub Pages. We use cookies.