Active Learning, Data Selection, Data Auto-Labeling, and Simulation in Autonomous Driving — Part 4

Isaac Kargar
7 min readFeb 14, 2024

It’s time for Tesla! Would be an interesting one!

Tesla

In this talk on Tesla AI Day in 2019, Andrej Karpathy explains the active learning procedure at Tesla, which they call the Data Engine. For example, in an object detection task and for a bike attached to the back of a car, the neural network should detect just one object (car) for downstream tasks such as decision-making and planning. Check the following image:

source

They find a few images that show this pattern and use a machine learning mechanism to search for similar examples in their fleet to fix this problem. The returned images from the fleet can be as follows:

source

Then human annotators will annotate these examples as single cars, and the neural network will be trained on these new examples. So, in the future, the object detector will understand that it is just an attached bike to a car and consider that as just a single car. They do this all the time for all the rare cases. So their model will become more and more accurate over time.

After collecting some initial data, the models are trained. Then, wherever the model is uncertain, or there is human intervention or disagreement between the human behavior and the model output, which is running in shadow mode, the data will be selected to be annotated by humans, and the model will be trained on that data. For example, if the model for lane line detection does not work very well in tunnels, they will notice a problem in tunnels. So they use the explained mechanism to find similar images, annotate those, and train the model on those.

source

Andrej also talks about their automated mechanism to do data labeling in addition to expensive human annotators, called Fleet Learning.

--

--

Isaac Kargar

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/