Active Learning, Data Selection, Data Auto-Labeling, and Simulation in Autonomous Driving — Part 5

Isaac Kargar
10 min readFeb 14, 2024

Let's go for another interesting company!


Cruise also makes use of active learning. They refer to it as the Continuous Learning Machine (CLM).

Consider the task of prediction. The motion prediction model must be capable of handling both the nominal and longtail cases well. Here is the end-to-end motion prediction model that Cruise uses and announced in the Cruise Under the Hood event recently:


It is critical to note that while these longtail events do occur in the data collected on the road, they are extremely rare and infrequent. As a result, we concentrate on identifying the needle in the haystack of daily driving and use upsampling to teach the models about these events.

A naive approach to identifying rare events would be to manually engineer “detectors” for each of these longtail situations to assist in data sampling. For instance, we could create a “u-turn” detector that generates sample scenarios whenever it is triggered. This approach would enable us to collect targeted data, but quickly fails when scaling up, as it is impossible to write a detector that is sufficiently specific for each unique longtail situation.

Self-supervised learning is a viable option for the prediction task at hand. In each scenario, we can compare our model’s prediction to the ground truth future trajectory of each car, and if they differ, we can label that scenario and train our model on that. These error situations can be automatically identified and mined. The labeling does not require human annotations and can be done automatically using the logged future trajectory of the car. Following that, these longtail events should be upsampled. The auto-labeled approach ensures maximum coverage of the dataset by identifying and mining all model errors, ensuring that no valuable data is missed. Additionally, it keeps the dataset as lean as possible by ensuring that no additional data for already-solved scenarios is added to the dataset.



Isaac Kargar

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University |