Active Learning, Data Selection, Data Auto-Labeling, and Simulation in Autonomous Driving — Part 5
Let's go for another interesting company!
Cruise
Cruise also makes use of active learning. They refer to it as the Continuous Learning Machine (CLM).
Consider the task of prediction. The motion prediction model must be capable of handling both the nominal and longtail cases well. Here is the end-to-end motion prediction model that Cruise uses and announced in the Cruise Under the Hood event recently:
It is critical to note that while these longtail events do occur in the data collected on the road, they are extremely rare and infrequent. As a result, we concentrate on identifying the needle in the haystack of daily driving and use upsampling to teach the models about these events.
A naive approach to identifying rare events would be to manually engineer “detectors” for each of these longtail situations to assist in data sampling. For instance, we could create a “u-turn” detector that generates sample scenarios whenever it is triggered. This approach would enable us to collect targeted data, but quickly fails when scaling up, as it is impossible to write a detector that is sufficiently specific for each unique longtail situation.
Self-supervised learning is a viable option for the prediction task at hand. In each scenario, we can compare our model’s prediction to the ground truth future trajectory of each car, and if they differ, we can label that scenario and train our model on that. These error situations can be automatically identified and mined. The labeling does not require human annotations and can be done automatically using the logged future trajectory of the car. Following that, these longtail events should be upsampled. The auto-labeled approach ensures maximum coverage of the dataset by identifying and mining all model errors, ensuring that no valuable data is missed. Additionally, it keeps the dataset as lean as possible by ensuring that no additional data for already-solved scenarios is added to the dataset.
- Drives: The CLM starts with the fleet navigating in the city.
- Error Mining: Active learning is used to automatically identify error cases, and only scenarios with a significant difference between prediction and reality are added to the dataset. This enables highly targeted data mining, ensuring that we add only valuable data and avoid bloating the datasets with easy and uninformative scenarios.
- Labeling: All of our data is automatically labeled by the self-supervised framework, which uses future perception output as the *ground truth* for all prediction scenarios. While the core CLM structure is applicable to other machine-learning problems where a human annotator is required, fully automating this step within prediction enables significant scale, cost, and speed improvements, allowing this approach to span the entire longtail.
- Model Training and Evaluation: The final step is to train a new model, run it to rigorous testing, and finally deploy it to the road. The testing and metrics pipelines ensure that a new model outperforms its predecessor and generalizes well to the nearly infinite variety of scenarios found in the test suites. Cruise has made significant investments in the machine learning infrastructure, which enables the automation of a variety of time-consuming tasks. As a result, they are capable of creating an entire CLM loop without human intervention.
Let’s review some examples.
U-turn is one of the longtail scenarios which happens very rarely. The following image shows different trajectories (red ones) in an intersection starting from the black point.
As demonstrated in the image, the majority of the dataset consists of drivers traveling straight with few left turns, even fewer lane changes, and only two U-turn trajectories. Another example of the uncommon mid-block u-turn can be seen below:
When CLM principles are applied, the initial deployment of the model may underpredict U-turn situations. As a result, when we sample data, we frequently encounter error situations involving U-turns. Over time, the dataset gradually increases its representation of U-turns until the model is capable of sufficiently predicting them and the AV is capable of accurately navigating these scenarios.
K-turn is the other longtail scenario. The K-turn is a three-point maneuver that requires the driver to move forward and backward in order to complete the turn in the opposite direction. These are uncommon and are most frequently used when the street is too narrow for a U-turn.
Cut-in is another rare scenario that we need to be able to predict in order to handle the situation and yield for the car if needed.
Another kind of interesting scenario is the one with the interaction between agents. For example:
Cruise employs an interaction-centric architecture with an agent-to-agent graph and an attention mechanism for detecting agent interaction. For instance, in the previous scenario, the ego car and a bicycle are driving alongside one another, and the parked car wants to slightly come back. The car understands the interaction and anticipates that the cyclist will nudge to the left to avoid it. As a result, the self-driving car slows and yields to the cyclist.
Additionally, they have an interaction auto-labeler that can determine whether or not a pair of agents interacts. And, if that is the case, who wins the interaction? Then, as additional self-supervision, this interaction auto-labeler can mine scenarios and define auxiliary tasks for interaction detection and resolution.
Not only the future is uncertain, but also the world behind occlusions. Therefore, they designed their AI system to understand which part of the world is occluded and proactively anticipate other agents before even seeing them. For example, when a door pops open, their system can anticipate a pedestrian coming out of the door. So the car slows down immediately and steers further away from it.
The other example is when a large garbage truck obscures the driver’s view; even though the driver cannot see anything behind the truck, the system imagines a pedestrian attempting to cross the street.
The other example is that prior to driving through an intersection, even if the system does not see any cross-traffic due to occlusion, the system imagines a car crossing from the right and slowing down; if it does see a car approaching, the autonomous car can stop in time.
Thus, regardless of what occurs in the future, the car will always be prepared to make prudent choices. All of this is due to the high-quality data that was used to train the model.
Additionally, they developed a few-shot active learned classifier for the purpose of mining-specific behaviors. For instance, if we want to train a model to predict when bikes will make a U-turn and want to find similar trajectories, a naive similarity search using embedding features would return left-turn scenarios. Because the two behaviors are somewhat similar left-turning is a much more common occurrence than u-turning. However, with the assistance of human supervision, we can train a classifier with much less data and a higher degree of accuracy and return a variety of true positive u-turn examples.
Cruise also employs Reinforcement Learning (RL) to develop a safe policy. This application of RL was one of the aspects of the *Cruise Under the Hood* event that I enjoyed the most. Take the following example:
This scenario is possible on a regular basis, but we may not have it in the dataset. To deal with these situations, they use reinforcement learning to train an offline policy to understand what happens when pedestrians are extremely close to the ego car. They simulate decades of data in order to develop a cautious policy. As illustrated in the image, there are two representations of how a policy appears. Let us begin with the left one. If the pedestrian is in the yellow zone, they are in a very dangerous stage; they can run towards the vehicle or to the side, but if the vehicle is traveling at a high initial velocity, a collision is possible. Thus, the learned and safe policy would be to exercise caution if the pedestrian enters the yellow zone, as we have no idea what they will do.
Now, let us discuss the right one. They can train policy offline using a simulated latency, in this case, 400 milliseconds, and as can be seen in the image, the yellow area is significantly larger and extends to the vehicle’s sides. Because of the system’s latency, we must exercise more cautiously.
Additionally, they use simulation to learn policies offline for complex interactions involving multiple actors. For instance, the following video demonstrates two vehicles attempting to park. The same technique can be applied to the learning of a wide variety of behaviors both offline and in a simulator. Additionally, it can be used to generate data that they do not have or that is difficult to obtain.
The other reason for using simulation in Cruise, which they refer to as Morpheus, is for safety. Simulation can be used to practice handling longtails and gradually reduce the reliance on real-world testing. Because longtails occur only once every thousands of road miles, testing the model in those scenarios will take a long time and is not scalable. Cruise has developed a system for exploring large-scale parameter spaces in order to generate test scenarios on a scalable basis. They can begin their simulation by searching for a specific location on the map. The following video demonstrates how they can generate a large and diverse test suit specific to a given situation. Beginning 15 meters before a left turn and then adding a straight road intersection with the turn and maybe adding an unlimited number of other parameters. It’s astounding!
The next step is to introduce additional agents into the scene. They accomplish this through the process of converting real-world events into editable simulation scenarios dubbed road to sim. It combines perception data with heuristics developed over millions of real-world miles to recreate a complete simulation environment from road data. The following video depicts a simulation of a recreated scenario (bottom one) from the real world (top one). Then, while the scenario is running in simulation, we can experiment with various parameters and attributes.
To increase the realism of the simulation, they use an artificial intelligence system called NPC (non-player character — a term borrowed from video games) AI to simulate the complex multi-agent behaviors of all agents in the scene except the autonomous car. The following are two variations on a single simulated environment in which an NPC is used to provide life to other agents:
Along with the previously mentioned technologies, Cruise employs another called World-Gen to expand their business into new cities. It is capable of procedurally generating an entire city, with roads, curbs, sidewalks, lane markings, street lights, traffic signs, buildings, automobiles, and pedestrians. Here is the automatically generated Alamo Square Park in San Francisco:
Additionally, they create high-quality simulations of various sensors and use them to generate synthetic data for the perception module and also collect data for instances such as emergency vehicles, which are uncommon and difficult to collect in the real world, and we need to detect them extremely precisely:
Furthermore, the simulation can be used to evaluate the algorithms’ comfort and safety.
There is a lot of cool stuff they announced at the event. I highly encourage you to watch the event below:
That was all for Cruise!
Thank you for taking the time to read my post. If you found it helpful or enjoyable, please consider giving it a like and sharing it with your friends. Your support means the world to me and helps me to continue creating valuable content for you.