Links:
Links:
With over 4 years of industry experience as a Machine Learning Engineer, I have demonstrated success driving impactful projects spanning Multimodal LLMs, Generative AI, and Computer Vision domains. My passion lies in translating applied research into cutting-edge AI solutions, leveraging a diverse skill set across ML and Engineering to build and deploy high-quality products.
I hold a Master’s degree from Carnegie Mellon University, where my focus was on research in Multimodal Deep Learning and Text Information Extraction.
Check out our research on assistive technologies for the visually impaired: https://aiguidedog.wordpress.com/
The transcripts are edited for clarity, sometimes with AI. If you notice any incorrect information, let us know.
Alexey: Hey everyone, welcome to our event. This event is brought to you by DataTalks.Club, a community of people who love data. We have weekly events, and today is one of them. If you want to find out more about the events we have, there is a link in the description. Click on that link and check it out. (0.0)
Alexey: We actually have quite a few events in our pipeline, but we need to put them on the website. Keep an eye on it. We will publish them soon, and you’ll see them. Don’t forget to subscribe to our YouTube channel. This is the most reliable way to get notified when our stream starts. (19.0)
Alexey: Last but not least, do not forget to join our data community where you can hang out with other data enthusiasts. During today's interview, you can ask any question you want. There is a pinned link in the live chat. Click that link, ask your questions, and we'll cover them during the interview. (35.0)
Alexey: That’s the usual introduction I do. I’m a bit sleepy, but I hope it goes well. If you’re ready, I have the questions prepared in front of me. (1:01)
Aishwarya: Yeah, sounds good. Ready. (1:20)
Alexey: Today on the podcast we are joined by Aishwarya. Do I pronounce your name correctly? (1:26)
Aishwarya: Yes, that is right. (1:32)
Alexey: Good. Aishwarya is a Machine Learning Engineer at Waymo, formerly part of Tesla's Autopilot AI team and a Carnegie Mellon University alumna. She has worked across some of the toughest applied AI problems: financial recommendation systems at Morgan Stanley, multimodal research at CMU, perception and video understanding at Tesla, and now gesture and pedestrian semantics at Waymo. She has also contributed to AI for social good, including a malaria mapping project in Africa that achieved real-world impact at scale. Welcome to this event. (1:33)
Aishwarya: Hi, thank you for having me. (2:06)
Alexey: That’s quite a nice bio. Probably what you do now and at Tesla is very challenging, but for me Morgan Stanley also sounds very challenging because I worked a little near high-frequency trading. I wasn’t actually working on the trading system, but we were doing analytics on top of this data. It was huge, so probably quite challenging. Can you tell us more about your career journey? (2:12)
Aishwarya: Like you mentioned, at Morgan Stanley it was a lot of data. I was a Big Data Engineer there and handled huge amounts of data. I was there when Morgan Stanley was doing the acquisition of E*TRADE, so we had much more data coming in. My role was handling this data, connecting the different dots, and analyzing them together. (2:51)
Aishwarya: From there I realized that data has so much value that we don’t need to do everything manually. That was back in 2018, and the AI domain hadn’t yet fully formed, but it was getting there. People were realizing its importance, and finance was one of the last fields to take on machine learning and AI. We were onboarding systems, and that’s when I decided to get hands-on with AI, starting with smaller systems like recommendation engines that were already well known in the AI domain. (3:16)
Aishwarya: I started with that and then got more research-oriented. We tried out graph neural networks, which were more complex topics at the time, and I realized there was so much to learn and the field was vast. That’s when I decided to pursue a master’s and joined Carnegie Mellon University. My program was a mix of data science and machine learning, so I could draw upon both my experiences and interests. (3:55)
Aishwarya: During CMU, I was more inclined toward projects involving computer vision. I worked on a project for a navigational app for blind people called AI Guide Dog. It captures the world and helps navigate people without vision. From there, I got into Tesla because it was also in the computer vision and navigation domain, and that’s where my self-driving journey began. From Tesla to Waymo, it’s a similar domain but with different kinds of products. (4:33)
Aishwarya: That’s where I am now at Waymo, in the self-driving domain. I started in finance and reached a completely different field, but it has been an interesting journey. (5:34)
Alexey: That’s an interesting journey. The app that you developed for blind people, can you tell us more about it? How does it work? Do they hold their phone and it tells them where to go or describes things? (5:39)
Aishwarya: For this app, the goal was that people without vision should be able to navigate the world just as sighted people do. The app is basically their eyes. You hang it around your neck and walk with it, and it captures the world in front of you. Then, via live audio instructions, it tells you to keep walking straight, take a left or right, stop at a traffic signal, or notice a pedestrian crossing. It gives instructions via audio. (6:03)
Alexey: Interesting. Was it a pet project, a company project, or an AI for social good project? How did you get involved? (6:41)
Aishwarya: My program has something called the Capstone Project. Every year, you either pair up with a professor or someone from the industry who has an interesting project, and then you work on it. This project was from a CMU alumnus who now works at Pinterest. He started the project, and ours was the third iteration, where our team joined and continued the work. (6:49)
Alexey: Was it a community AI social good project or a company project? (7:26)
Aishwarya: It was more of a volunteer project. Every year, a group of CMU volunteers work on it, make progress, and then pass it on to the next group. (7:32)
Alexey: That’s a very interesting concept. (7:45)
Aishwarya: Yes. Before me, two batches had worked on it, and they were mentors to us. When I finished, I mentored the next batch. It’s now in its fifth iteration, with two more batches after me making more progress on it. (7:51)
Alexey: That’s a really nice idea. At DataTalks.Club, we have courses, and I immediately started thinking how we could implement something similar. It sounds amazing because it allows people from previous iterations to mentor those currently taking courses. Having a project like that helps them sharpen their skills. (8:12)
Aishwarya: And it doesn’t need to be done all at once. It’s a big project with many moving parts, so it can’t be built in one year or even six months. You pick small pieces: the first team worked on data, the second built baseline models, and the third improved evaluation. It’s iterative. It’s a good idea. (8:38)
Alexey: Do you know if this app is accessible in the App Store? (9:07)
Aishwarya: Not yet. The next community of students will be working on the app. We have the model, but it’s still in the beta phase. We are doing a lot of testing because it’s a sensitive use case. (9:14)
Alexey: I can imagine. I recently participated in a marathon. People who know me would roll their eyes because I wouldn’t stop talking about it. I had been preparing for so long. (9:26)
Alexey: During the marathon, blind people and those with vision problems also took part. They were running with guides who held their hands and ran together. I thought it was amazing to include them because they also want to be part of these events. (9:38)
Alexey: Since they cannot see, it’s difficult, but the organizers allowed guides to join and lead them. It was wonderful. Maybe with this app they wouldn’t yet be able to run a race, but it’s one step closer to that. (10:04)
Aishwarya: Yes, that’s the hope. They won’t need to rely on someone else. They can have their app as their guide for the world. (10:33)
Alexey: Also, with VR glasses, you probably heard that Meta and other companies have them. You put them on, and they have cameras that provide a broader view than mobile phones. Maybe this is something future alumni can work on. (10:45)
Aishwarya: Yes, those are the things we’re trying to work around. It needs to be cost-efficient, and we can’t afford to use expensive hardware like LiDAR. Since everyone already has a mobile phone, we’re trying to make it work with what people already have. (11:22)
Alexey: How expensive is LiDAR? (11:42)
Aishwarya: It depends on the quality. You can find some that are really cheap and others that are extremely high-end. (11:50)
Alexey: How do you pronounce it? LiDAR? (11:58)
Aishwarya: It’s LiDAR. (12:03)
Alexey: LiDAR. I know radar emits radio frequencies and waits for the waves to return to estimate objects and their movement. Bats do this with sound. LiDAR is similar, but instead of radio waves, it uses light, right? (12:08)
Aishwarya: Yes, that’s right. It uses light rays. That’s why it’s called LiDAR. (12:50)
Alexey: I thought it was based on lasers. (13:01)
Aishwarya: It’s similar. I think it’s one of the light frequencies. (13:03)
Alexey: I don’t know if you can talk about your work, but these systems are often used in cars, right? (13:08)
Aishwarya: Yes, for self-driving. It depends on the company’s stack. Some use LiDAR for fully self-driving systems where there’s no driver at all, while others like Tesla rely solely on cameras. (13:21)
Alexey: I’ve taken a few rides in Teslas as taxis, and it was fun to watch the screen showing cars and people around. It was interesting to see when it made mistakes or didn’t. It made the ride more entertaining than a normal car because I could watch the screen. (13:46)
Alexey: This system works with cameras, right? Sorry, I think I caught the flu this evening. I have a cold. (14:20)
Aishwarya: I hope you recover quickly. (14:40)
Alexey: Thank you. Could you please repeat how the Tesla visualization system works? Does it only use video cameras? (14:45)
Aishwarya: Yes, that’s the unique part of Tesla’s system. Since LiDAR is expensive, they focus on scalability and rely on cameras. But it’s not just one camera. There are cameras all around the car providing a 360-degree view. (15:05)
Aishwarya: The models process these different views to understand the surroundings and see all around the car. The car has a more holistic view of the world than a human driver, who cannot see behind or to the sides at the same time. (15:28)
Alexey: Yes, that makes sense. The goal of self-driving is to make driving safer once the AI reaches that level. (15:54)
Alexey: But what is this screen for? The self-driving system is there, but in the taxis I took, the drivers were still driving. So what’s the point of the screen? (16:06)
Aishwarya: At this point, it serves two purposes. On long drives or in stop-and-go traffic, you can use autopilot mode and don’t need to stay constantly alert. The car assists you. (16:24)
Aishwarya: I remember two years ago, I took a trip to Las Vegas, a 13-hour drive each way. The car handled about 95% of the driving, and I was just there monitoring. It made the trip so much easier because driving that long alone would have been exhausting. (16:44)
Aishwarya: It’s like an assistant system. Some people still prefer to drive because they don’t fully trust it yet. It’s also about building that trust factor. (17:05)
Alexey: I wonder if there are statistics about that. (17:18)
Aishwarya: Yes, there are statistics about failure rates and performance. (17:26)
Alexey: People often say, “It’s better if I drive than AI,” but the question is who actually drives better. Some people might be overconfident in their driving skills. (17:30)
Alexey: For long straight routes, like your trip to Vegas, maybe the AI handles it better. Was it mostly a straight highway? (17:47)
Aishwarya: Yes, it was a highway, so there weren’t many traffic lights or turns. You just go straight for miles. There’s a stretch that’s about 150 miles of straight road, and a normal person would be bored out of their mind driving that long. (17:52)
Aishwarya: I think it’s a highway, so there isn’t much traffic or many lights. You just go straight along the route. There’s a patch about 150 miles long that’s completely straight, and a normal person would get bored driving that much. (17:52)
Alexey: In Berlin, it’s probably more difficult because there are many bikes. The streets are narrow, and cyclists can appear suddenly. I guess that’s why they don’t use self-driving cars here. (18:10)
Aishwarya: I think Tesla is trying to enter the European markets with its autopilot. When I was there, I worked on recognizing European road signs, such as speed-limit signs. (18:32)
Alexey: But it’s also regulated, right? So maybe they can’t use self-driving yet. (18:45)
Aishwarya: Not yet. (18:53)
Alexey: That makes sense. I’m originally from Russia, and I know that in Moscow some cars already drive without drivers. I guess in San Francisco too, right? (18:59)
Aishwarya: Yes. San Francisco also has Waymo, which has no driver at all. People find it very interesting. (19:09)
Alexey: So you get something like Uber, and then a car comes with no driver, right? (19:16)
Aishwarya: Yes, there’s no one there. If you visit San Francisco, be sure to take a Waymo. It’s quite the tourist attraction right now. (19:22)
Alexey: Okay. That’s where you work, right? (19:34)
Aishwarya: Yes. (19:39)
Alexey: Is there an app called Waymo? (19:41)
Aishwarya: Yes, there’s a Waymo app you can use to hail a ride. In some cities, they’ve also partnered with Uber and Lyft so you can call a Waymo through those apps as well. (19:46)
Alexey: How much about your current position can you talk about? You mentioned you work on gesture recognition. Can you tell us more about that? (19:57)
Aishwarya: I can give a high-level picture. It’s about trying to understand gestures from people such as police officers or construction workers who guide traffic, for example when there’s a big event or roadwork. (20:17)
Alexey: So there is a police officer who tells you to slow down? (20:29)
Aishwarya: They tell you to stop or to go. Humans slow down when there is a police officer. The car tries to follow traffic rules. I try to understand what they want to communicate and modify the car's route or behavior accordingly. (20:35)
Alexey: In the case of my mom, if a ride-hailing car comes without a driver and there is suddenly an event with a lot of people, what should the car do? In training data, it is less common to have such cases, like a traffic light breaking or a police officer controlling traffic. (21:05)
Aishwarya: I think all of these cases are covered. Waymo has been in business for around 15 years and they have worked to cover many of the cases we see. Broken traffic lights and large crowds are handled well. There are sometimes events and game nights where it performs very well. During those cases police officers direct traffic and my job is to make Waymo better at understanding them. (21:36)
Alexey: How much can you talk about this project? What kind of technology do you use, and how fast does it need to be to make decisions in real time? (22:17)
Aishwarya: There are in-house models that use cameras, LiDAR, and other sensor information from the car. Waymo does not publish the exact models it uses. The internal models are optimized to run on the car and to run very fast. They are not necessarily the same networks used during training. We use techniques so they can detect multiple times per second and understand what is happening in the world in real time. (22:43)
Alexey: What is this process called when you take a big model and make it smaller and faster? (23:28)
Aishwarya: There are many ways. One publicly known method is quantization. Quantization makes the model smaller and faster. There are many similar techniques and additional internal optimizations. (23:35)
Alexey: I do not want to put you in an uncomfortable position by asking too many details. One project I find interesting is the malaria mapping project in Africa. Can you tell us more about that? (24:05)
Aishwarya: This was when I was at Morgan Stanley and I found the domain really interesting. I joined Omdena, which runs AI for good projects. They pair nonprofits with volunteer ML engineers who form teams of thirty to forty people. One nonprofit, Zap Malaria, led fumigation efforts in Africa and wanted to make fumigation more efficient by targeting areas with high mosquito probability. (24:31)
Aishwarya: Initially, they would just go to places and fumigate to prevent breeding and spread. They wanted to target only regions with high probability of mosquitoes instead of everywhere. We thought satellite images or knowledge of marshy lands and stagnant water could identify breeding grounds. (25:18)
Aishwarya: That approach would let fumigation teams focus efforts and save manpower and cost, which is crucial for nonprofits. Our team split into groups; one used satellite images to detect stagnant water bodies. My model used topographic information from Google to identify low-lying areas likely to collect water. (25:49)
Aishwarya: We trained models to detect those regions and combined satellite and topographic approaches into an ensemble. They integrated this system and reported very good results. It was a volunteer-based, AI-for-good project. (26:24)
Alexey: That sounds interesting. How accurate was this approach? Did they share feedback from the ground? (27:03)
Aishwarya: They did. They reported the model performed well and allowed them to focus on critical areas. They shared that it helped them save time and improve effectiveness. It was fulfilling because it had a clear social impact. (27:17)
Alexey: That must have felt rewarding. Working on projects that create real-world benefits is always motivating. (27:47)
Aishwarya: Absolutely. It showed how machine learning can help with problems outside the corporate world. It also gave me the confidence to work on complex challenges in different domains. (27:57)
Alexey: How did you end up at Waymo after that? The transition from finance to self-driving cars sounds big. (28:19)
Aishwarya: It happened step by step. I worked in finance, then healthcare, and finally moved to autonomous vehicles. Each step helped me build different technical and domain skills. I wanted to work on something that combines AI with real-world physical systems, and Waymo was perfect for that. (28:31)
Alexey: What does your day-to-day work look like at Waymo? (29:05)
Aishwarya: I work mostly on perception models, helping the car understand the world around it. That includes identifying pedestrians, vehicles, traffic lights, and understanding dynamic situations. I also work on improving accuracy and speed. The models must process information very quickly while maintaining reliability. (29:11)
Alexey: How do you test those systems before putting them on the road? (29:45)
Aishwarya: There are multiple stages. First, we test everything in simulation, where we recreate millions of real-world scenarios. Then we move to closed tracks with controlled conditions. Finally, we do on-road testing with safety drivers. Only after extensive testing do we deploy updates to cars operating without drivers. (29:51)
Alexey: That makes sense. The safety requirements must be very strict. (30:29)
Aishwarya: Yes, extremely strict. Safety is the top priority in every decision. Each change goes through rigorous validation and approval. There are layers of redundancy to ensure that even if one system fails, others keep the car safe. (30:35)
Alexey: What kind of data do you collect from the cars? (31:02)
Aishwarya: We collect sensor data like camera images, LiDAR scans, radar, and GPS. We also gather metadata about driving conditions and system responses. The data is anonymized and used only for improving performance and safety. (31:07)
Alexey: That must be a huge amount of data. (31:37)
Aishwarya: It is massive. Waymo has been operating for years, so the scale of data is huge. Managing and labeling it is a major challenge. We rely heavily on internal tooling and automation. (31:42)
Alexey: Do you use human labelers or automatic labeling for that data? (32:09)
Aishwarya: Both. Human labelers handle complex cases, while automatic systems take care of repetitive tasks. The combination improves both speed and accuracy. We constantly refine labeling quality to ensure the models learn from the best data possible. (32:14)
Alexey: How often do you update the models in the cars? (32:43)
Aishwarya: Updates depend on project cycles and validation results. Some improvements roll out every few weeks, while major updates take longer. Every release goes through multiple safety checks and real-world validation before deployment. (32:48)
Alexey: How large is the team you work with? (33:19)
Aishwarya: Waymo has many specialized teams. I work closely with perception, data, and simulation engineers. Collaboration is key because every component affects others. Even a small change can influence performance across the system. (33:24)
Alexey: It sounds like a huge operation. (33:56)
Aishwarya: It is. Building autonomous driving technology requires expertise in software, hardware, sensors, and safety. Each team contributes to a different layer, and together they make the system work smoothly. (34:00)
Alexey: What do you enjoy most about your work? (34:29)
Aishwarya: I enjoy solving real-world problems with tangible impact. Seeing a car drive safely using models we built is very satisfying. It feels like working on the future of mobility. (34:33)
Alexey: Do you sometimes ride in Waymo cars yourself? (34:59)
Aishwarya: Yes, I have. It is an amazing experience. The first time feels a bit strange because there is no driver, but after a few minutes you realize how smoothly it drives. It follows all rules and handles complex scenarios confidently. (35:04)
Alexey: That must be surreal. How does it handle unexpected events like pedestrians suddenly crossing? (35:34)
Aishwarya: The car constantly monitors surroundings with multiple sensors. It predicts motion paths and can react within milliseconds. It prioritizes safety and will slow down or stop immediately when needed. (35:43)
Alexey: How much of this technology is transferable to other domains? (36:12)
Aishwarya: Quite a lot. Perception, prediction, and planning models have applications beyond autonomous driving. Similar approaches can be used in robotics, drones, or industrial automation. (36:17)
Alexey: Do you collaborate with research teams or external partners? (36:44)
Aishwarya: Yes. We collaborate with academic and research groups to advance the field. Waymo also contributes to open-source tools and publishes papers to share learnings with the broader community. (36:49)
Alexey: That’s great. What’s one thing you learned from working at Waymo that surprised you? (37:18)
Aishwarya: How complex real-world driving is. Even simple-looking actions involve multiple models and systems working together. It made me appreciate how much coordination and testing is required to make everything reliable. (37:25)
Alexey: What advice would you give to someone who wants to work in self-driving technology? (37:57)
Aishwarya: Start by building a strong foundation in machine learning and computer vision. Get hands-on experience with data and simulation. Learn about safety-critical systems because reliability is key in this field. (38:03)
Alexey: Do you think it’s a good time to enter the industry? (38:33)
Aishwarya: Yes. The field is evolving quickly, and there’s still so much to explore. Many challenges remain, especially around edge cases and scaling. It’s a great time for engineers who like solving complex problems. (38:38)
Alexey: What skills are most valuable for someone entering this space? (39:09)
Aishwarya: Strong programming skills, understanding of ML fundamentals, and experience with large-scale systems. Knowing data pipelines and simulation helps too. Curiosity and persistence matter a lot because the work can be challenging. (39:15)
Alexey: Do you think the industry is close to full self-driving adoption? (39:49)
Aishwarya: We are getting closer. Fully autonomous driving in all conditions is still hard, but progress is steady. It will likely start with limited areas and then expand gradually as technology matures. (39:54)
Alexey: Do you see Waymo expanding internationally? (40:29)
Aishwarya: Potentially, yes. Each country has different regulations and infrastructure, so it takes time. Waymo is focused on perfecting safety and scalability first before expanding widely. (40:34)
Alexey: What kind of challenges do regulations create? (41:03)
Aishwarya: Regulations vary by region. Some areas allow testing with safety drivers, while others permit fully driverless operations. We work closely with local authorities to ensure compliance and transparency. (41:08)
Alexey: How does public perception affect your work? (41:40)
Aishwarya: Public trust is essential. We focus on transparency, safety, and consistent communication. People need to feel comfortable sharing the road with autonomous cars. Every successful ride helps build that confidence. (41:45)
Alexey: If you could apply your skills to another big problem in the world, what would it be? (42:18)
Aishwarya: Probably climate change. Machine learning can help optimize energy usage, improve transportation efficiency, and support sustainability efforts. Applying technology to make a positive global impact is very motivating. (42:24)
Alexey: That’s inspiring. Thank you for sharing your experience and insights. It was great learning about your work and the impact of AI in the real world. (42:56)
Aishwarya: Thank you. It was a pleasure talking with you. (43:07)
Alexey: For me, I think it would be two weeks. If I needed to get into any modern papers like this, I don’t know if I would go to Arxiv and take any paper about NLP or computer vision. Yeah, good that we have ChatGPT now, so I can ask it to explain things. (43:16)
Aishwarya: That is so good now. It wasn’t there back when we were in school and college doing assignments. Where was ChatGPT then? (43:38)
Alexey: Interesting that you mentioned you didn’t have any prior experience with reinforcement learning because I thought reinforcement learning is something used quite often for driving too. For me, before this whole LLM space appeared, AI meant machine learning, which was a part of AI. But reinforcement learning can get an agent to do things in an environment and learn. (43:44)
Alexey: There were companies I interviewed with that were creating environments for self-driving cars to test in. They had environments with streets and everything looking very realistic. It was in Germany, so I guess companies like BMW and Audi that work on self-driving could use it to test their cars. They used reinforcement learning frameworks where the car could go wild and learn that hitting a pedestrian results in a penalty, so it learns not to do that. (44:20)
Alexey: It was interesting, but it didn’t work out, so I didn’t join the company. It was funny—it was a company with four people in a basement and a lot of GPUs. They thought they needed one more person. (45:07)
Aishwarya: That doesn’t sound very stable. Wow, okay. But it was a good thing. Reinforcement learning is interesting. (45:24)
Aishwarya: My first interaction with reinforcement learning was with robots. In college, we had Robo Wars where you build your robot, go to a tech festival, and compete against others. Reinforcement learning is still a big part of robotics. (45:37)
Aishwarya: So far, my career has been in computer vision and robotics, mostly on the perception side understanding the world. Reinforcement learning comes in when teaching an agent how to behave in the world. These are two parts of the stack: perception and behavior. (45:55)
Aishwarya: I work on perception making the agent understand the world while reinforcement learning is for behavior. Even though I’m in the self-driving industry, I’ve never worked on that part. I skipped reinforcement learning courses in college because I found them hard. (46:31)
Alexey: But I imagine you would not let a car go wild and learn how to interact with pedestrians outside in real life. That would not be fun. That’s for sure it’s karma gone wrong. (46:50)
Aishwarya: Honestly, I don’t know if we use reinforcement learning. I’ve never tried to find out because it looks more like a fun project to do in an emulator. (47:07)
Alexey: Yeah, but in real life, you still need some rules. There’s actually a question from Ole. He’s asking about self-driving: is it full AI or a mix of rules and AI? I assume he means self-driving because full AI would be reinforcement learning when the car learns to drive by itself. But we still need to add rules. What’s the current state of self-driving AI? (47:19)
Aishwarya: I think all environments, even in reinforcement learning or other training methods, have constraints. You impose the rules of the world, like not driving against traffic. It’s not completely free to learn however it wants. (47:56)
Aishwarya: It’s definitely constrained by many rules. As you expand into different countries or continents, new rules appear. Even within a country, different cities have different driving patterns. Some are aggressive, and some follow rules more closely. It needs to adapt and remain constrained. (48:30)
Alexey: In Italy and Germany, driving is very different. In Berlin, people drive slowly, and it’s easy to cross the street. But in southern Italy, good luck—you just have to walk across, and then they stop. Otherwise, they keep going. (48:59)
Aishwarya: It still needs to learn all these patterns. That’s why it’s such a hard problem. Everything changes so much across geographies. (49:24)
Alexey: I was thinking about chess. In chess and Go, reinforcement learning was used to build state-of-the-art models. They let the AI explore the game freely instead of learning from past games. Because of that, it could play better than humans. (49:30)
Alexey: With self-driving, it’s different. You still need to obey rules. In chess, you also have rules like how a knight or bishop moves, but they are fixed and limited. (50:15)
Aishwarya: The difference is that in chess, the rules are fixed every piece has a clear purpose, and that’s it. You can explore within those limits. But in self-driving, the rules are constantly evolving. (50:29)
Aishwarya: You have an infinite number of rules, so it’s hard to teach the model in such a changing environment. I honestly don’t know if we use reinforcement learning, but constraints definitely play a role in any model we use. (51:02)
Alexey: There’s a question from Adonis: how does testing work for sensitive cases like autonomous driving? Do developers inherit tests, or are there stages? How is testing organized? (51:28)
Aishwarya: It depends on what change you’re making. I work on pedestrians and gestures, so we have evaluations around cases with pedestrians or past events. We rerun new models on those cases as the first stage. (51:56)
Aishwarya: Then we evaluate a broader set of real-world scenarios involving pedestrians. Like the question mentioned, there are stages where you start small, then move to larger evaluation sets. Finally, you roll it out slowly to drivers and expand. (52:16)
Alexey: Another question about LLMs. We already talked about how they can do computer vision, but they are slow. Can they be applied to self-driving? Does it make sense to use generative models for that? (52:53)
Aishwarya: There have been many attempts. The latest research and some companies like Wave are using multimodal LLMs for end-to-end self-driving. There’s room for it because LLMs are pretrained on massive data, so they have world knowledge. (53:20)
Aishwarya: The challenge is making them fast enough. Tradeoffs and techniques are needed, but it’s actively explored in research and by companies. Some systems already use LLMs for self-driving. (53:57)
Alexey: We talked about patterns in different countries, like Italy versus Germany. LLMs might already know these differences from their training data. Maybe that makes it easier for them to adapt. (54:17)
Aishwarya: That’s the hope, that they have some knowledge about various things that curated datasets might miss. LLMs have broad world knowledge, so they can help tune systems for global use. (54:53)
Alexey: Okay, maybe last question. If I want to work on self-driving cars, what should I do? What should I study? How can I get into this industry? (55:25)
Aishwarya: It starts with deep learning. I was good at deep learning and got into the AI Guard Dog project, which used vision and navigation. That led me to Tesla. It’s about knowing fundamentals and doing relevant projects. (55:45)
Aishwarya: If you do computer vision projects, that’s a great start. It helps your resume stand out so companies know you’re familiar with the space, and you can get interviews and improve from there. (56:12)
Alexey: So, a good pet project could be building an app that uses your camera to describe objects in your room. It could say there’s a bed, a clock, and so on. (56:24)
Aishwarya: Yes, and with LLMs, it’s even easier. If you prompt an LLM correctly, you don’t need to train anything; it just tells you what’s in the room. (56:52)
Alexey: You can even ask ChatGPT to write the app and then learn from how it did it. (57:05)
Aishwarya: That’s two stages, you don’t even write the app yourself; the AI does it. (57:12)
Alexey: I recently used a tool like Cursor, a coding agent. I asked it to implement a multi-agent system for evaluating GitHub projects, and half an hour later, it worked. (57:21)
Alexey: I tweaked it a bit with prompts, and it was running. Then I studied the code to understand how it was implemented, and I thought, okay, now I know how to do it. (57:40)
Aishwarya: It’s fascinating. These projects used to take weeks in college, but now they’re done so quickly. Still, sometimes it gets stuck on weird bugs. (57:51)
Alexey: Then you have to intervene and learn what the AI built before fixing the bug. You have to understand the entire codebase. (58:05)
Aishwarya: It’s great for setting up frameworks when starting new projects from scratch. But for details, you still need to know what you’re doing. (58:23)
Alexey: Okay, it was amazing talking to you. Thanks a lot. I’m glad we made it work. Sorry you were sick. I hope you recover quickly. It’s late for you, so you should rest, and I’ll go have breakfast. (58:35)
Aishwarya: All right, have a good day. (58:55)
Alexey: You too. Thanks, everyone, for joining us today. (59:01)