Active Learning, Data Selection, Data Auto-Labeling, and Simulation in Autonomous Driving — Part 2

Isaac Kargar
5 min readFeb 14, 2024

Now, let’s look at what big companies do in the real world.

NVIDIA

NVIDIA proposes to use pool-based active learning and an acquisition function based on a disagreement between several trained models (the core of their system is an ensemble of object detectors providing potential bounding boxes and probabilities for each class of interest) to select the frames which are most informative to the model. Here are the steps in their proposed approach:

  1. Train: Train N models initialized with different random parameters on all currently labeled training data.
  2. Query: Select examples from the unlabeled pool using the acquisition function.
  3. Annotate: Annotate selected data by a human annotator.
  4. Append: Append newly labeled data to training data.
  5. Go back to 1.
source

They assume the object detector generates a 2D probability map for each class (bicycle, person, car, etc.). Each position in this map relates to a pixel patch in the input image, and the probability indicates whether an object of that class has a bounding box centered there. This type of output map is commonly found in one-stage object detectors like SSD…

--

--

Isaac Kargar

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/