Active Learning, Data Selection, Data Auto-Labeling, and Simulation in Autonomous Driving — Part 3

Isaac Kargar
4 min readFeb 14, 2024

Let’s see how Waymo does active learning.

Waymo

Waymo uses active learning too, obviously. In this talk, Drago Anguelov explains about the ML factory used at Waymo:

source

The lifecycle is almost similar to what we saw for NVIDIA. Most of the data come from some common scenarios and does not have enough information for the model to learn. So it is essential to know how to select the data. They have data mining and active learning pipelines to find rare cases and situations where the models are uncertain or inconsistent over time and label those cases. Then this labeled data will go for model training. They also have auto-labels in their system. When you collect data, you also see the future for many objects. This knowledge about the past and the future will help annotate data better, go back to the model that does not know the future, and replicate it with the model.

Waymo also released the Open Motion Dataset and had a competition at CVPR 2021. The dataset is labeled using a deep learning model in offline mode published in CVPR 2021: Offboard 3D Object Detection from Point Cloud Sequences. Running the model in offline mode is not limited by latency constraints on the vehicle and also benefits from seeing…

--

--

Isaac Kargar

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/