Active Learning, Data Selection, Data Auto-Labeling, and Simulation in Autonomous Driving — Part 6
Let’s go for Waabi!
Waabi
In a recent workshop on self-supervised learning for autonomous driving at ICCV 2021, Raquel Urtasun talked about their labeling mechanisms at Waabi. She mentions that there is no need for humans in the labeling loop and it is possible to make the entire loop automatic.
Here is the Autonomy workflow used at Waabi:
We have access to a fleet of vehicles as well as data collection platforms. So while we can collect a large amount of data, labeling it all is prohibitively expensive. Change and evolution of datasets, on the other hand, is necessary and occurs frequently in industry and the real world. However, because the world is changing as we drive to different cities, seeing different scenes and situations, and the city changing due to, for example, constructions, we need to change our datasets and train our models on them in order to be able to handle the situations that we see and cannot handle. Annotating these datasets is costly and the solution for that can be data curation.
In order to select samples in data to label, there are several techniques.
Interesting
We can choose a data point that we believe is interesting, exciting, and would be beneficial to learn about, whether for training or testing purposes.
They have some measures in place to select data from the logs and data collected by each of the vehicles. They accomplish this by using an intermediate process to tag logs with various properties, which they can then use to generate various notions of what might be interesting. They then rank and select the best examples. Automation is critical in this process.
While humans are capable of tagging, the process must also be automated. They’ve devised two distinct methods for automatic tagging. The first is modular tagging, which involves running the perception system in offline mode on data to perform detection, tracking, prediction, and so on, and then determining whether the data is interesting and also determining the scene’s complexity.
The other approach is to use a learning-based approach. Basically, you can learn to tag through the use of a sophisticated neural network. Then you can have sophisticated tags about what is happening in the scene and then use those for complexity measures.
Here we focus on the modular tagging approach.
Each scenario, which is a few seconds of driving, has tags, HDMap, information about the self-driving car, and all other traffic participants. They then calculate complexity measures and combine them to arrive at a single value for the scene’s complexity. Then, all of the scenarios will be ranked according to their complexity, and the top ones will be chosen.
So, how can we say a scenario is complex and interesting for us to be selected? It can be many things: map complexity measures, actor complexity measures, and Self-Driving Vehicle (SDV) complexity measures.
For the map complexity case, scenarios can be selected based on the following items ( In the following images, left is more complex and right is less complex):
- Motion paths: like high curvature roads or roads with odd shapes can be more interesting and have higher complexity.
- Lane-crossing: for example, an intersection where there are many lanes crossing each other and the car can go from different lanes to other lanes.
- Crosswalks: Scenarios with more crosswalks that might have more pedestrians can be more complex than others.
- Intersections: scenarios with more complex intersections can be interesting too. The left intersection has a more complex topology compared to the right one which is a very common and classic intersection.
- Traffic-controls: Intersections with many traffic lights that control many different things are more interesting.
- Map slope: scenarios with different slopes of the map and ground can be interesting and you need to make sure that your self-driving car can handle those too.
For actor complexity measures, the following cases can be considered:
- Dynamic or static actors: scenarios with more actors can be more complex and interesting and our self-driving car needs to be able to handle them.
- Class diversity: scenarios with various classes such as bicycles, pedestrians, animals, vehicles, etc., are more complex.
- Spatial diversity: scenarios that actors are in different locations in the scene can be interesting.
- Velocity: scenarios with actors with diverse velocities can be more complex.
- Path: variability of the path in the scenario can make it complex too.
The other case that can be considered for complexity measurement is the SDV complexity measure. The following items can be important in this case:
- Path/Velocity diversity: it is similar to the previous two items in the actor complexity measures case.
- Route: route of the SDV can make a scenario more complex. For example, in an intersection, turning is more complex than going straight.
- Interaction with traffic lights: scenarios that the SDV needs to deal with traffic lights can be more complex.
- Interaction with actors: we can also look at how do we interact with other actors. Is somebody cutting in front of us? Is somebody slowing down behind us? Is somebody entering our line? These are pretty interesting scenarios that we need to make sure our car can handle.
For more details on the mentioned measures, check the paper “Diverse Complexity Measures for Dataset Curation in Self-driving”.
After selecting interesting and complex scenarios based on the mentioned measures, we can label them. What is important to note is that depending on what is your goal and what are you interested in, for example, object detection or motion forecasting, or motion planning, certain things would be more interesting than others. The way to handle this would be very simple. We can compute the weighted sum of these complexity measures in order to project this high-dimensional vector into a single number.
In summary, the complexity of each scenario based on each one of the above-mentioned items will be calculated, and based on the task, the weight vector will be selected and multiplied by these complexity measures to calculate a single number for that scenario.
Diversity/Coverage
In addition to the scenarios that are interesting, we should consider the dataset’s diversity and coverage. At the end of the day, we are attempting to develop vehicles capable of operating within our operational domain, and we must ensure that they are trained to handle all possible situations.
One way to handle this is that as soon as you have decided which scenarios you want to label so far, you can look at what do they miss. What are areas in your space that your current selected scenarios do not cover and the new scenario is far from them? It is also important to add geographical diversity here. Then, we can iteratively select examples that are far away from our selected scenarios and label them. We repeat this procedure until the selected dataset has enough diversity and there is no new data point far enough from the selected points.
Model Improvement
The other factor for selecting scenarios is model improvement. We can select data points that aid the model’s performance improvement. So, the scenarios will be selected based on their expected improvement in the model performance or how much the model is uncertain about that scenario. Human intervention on the road can also be a notion of model failure in that scenario and shows that the scenario can be used to learn something from it. Basically, the active learning techniques that we mentioned at the beginning f this post can be used here.
One point that needs to be taken into account is that, in addition to prioritizing some scenarios over others, we can think of more important regions in a scene. Some regions are more important than others and as we pay annotators per each click, it is important to select and label parts of the scene that are informative and reach and not all the scene. For example, in the following scene, the cars in the middle of the image are more important than the parked cars on the left road.
Here is the flowchart they propose in their paper called “Just Label What You Need: Fine-Grained Active Selection for Perception and Prediction through Partially Labeled Scenes”:
Scoring the unlabeled examples can be based on the mentioned scores such as Entropy. And here is their proposed algorithm:
After selecting scenarios based on their interestingness, diversity/coverage, and model improvement we need to label them.
The good news is that we don’t need to do this auto-labeling in online mode. So we can use more sophisticated models in offline mode without worrying about latency and other online inference bottlenecks. We also have access to past and future timesteps which can help in the annotation.
For example, for the task of labeling a trajectory of an object with its bounding boxes, first, we can run the offline model to give us the first estimate of where are all actors in the scene as well as how they are moving, and then we are going to have sophisticated ways of correcting these results to have better trajectories. Here is the method they propose in the paper called “Auto4D: Learning to Label 4D Objects from Sequential Point Clouds”:
There are two branches in this approach. The first is to get the initially estimated trajectory and fix the size of the object. Then the second branch gets the trajectory with fixed size and also the point cloud of the object across time frames and fuses them and outputs a refined trajectory. Here is the result with very nice and smooth trajectories:
As with previous companies we’ve reviewed, Waabi makes extensive use of simulation.
Training is conducted using a 3D, real-world, high-fidelity simulation, which enables training on uncommon scenarios and eliminates the need for field data collection. In other words, there is no reason to drive “millions of miles” and create potentially dangerous situations or even collisions.
Furthermore, having a fleet of hundreds of vehicles on the road is prohibitively expensive, and it can be dangerous. Rather than that, Waabi employs an AI approach that is capable of learning from fewer examples and scaling.
Rather than requiring humans to design 3D assets by hand and engineers to implement rule-based simulation behavior, Waabi generates virtual worlds and multi-agent systems automatically based on observed human driving behavior.
Here are the components of their closed-loop simulation system:
The first component is the Virtual World Creation. Generating the background, cars, pedestrians and animating them are done in this component.
The next component is Scenario Generation to automatically generate realistic and diverse scenarios:
Here are some generated super realistic scenarios in simulation:
Additionally, Waabi models sensor noise using AI and physics, resulting in perception outputs that behave similarly in both simulation and the real world. The following videos show the simulated LiDAR and Camera sensor data in comparison to real data:
Some objects in the above video are fake and generated to make the scene more complex!
If you want to know more details about their LiDAR simulation, check their LiDARsim paper.
Then they use the simulator to test simple scenarios or those that occur frequently, as well as those that occur infrequently. Also, they can create safety-critical cases and test their models there.
That’s it for Waabi! Let’s go for the last company.
Thank you for taking the time to read my post. If you found it helpful or enjoyable, please consider giving it a like and sharing it with your friends. Your support means the world to me and helps me to continue creating valuable content for you.