LLMs in Autonomous Driving — Part 3

Isaac Kargar
7 min readFeb 18, 2024

Note: AI tools are used as assistants in this post!

In this part, we will review the DriveGPT4 paper. Let’s get started!

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

Self-driving cars have gone from science fiction to rapidly approaching reality. While much of the focus is on making these vehicles safe and reliable, what about making them understandable and interpretable? A new solution called DriveGPT4 is taking a big step forward in helping us understand how autonomous vehicles make decisions.

What is DriveGPT4?

DriveGPT4 is an autonomous driving system built on large language models (LLMs). While chatbots specialize in conversation, DriveGPT4 has been trained to process videos as well. This means it can “see” what the car’s cameras see and interpret that visual information through a language-based model. Basically, it is a Multi-Modal LLM (MLLM).

The core idea behind DriveGPT4 is improving the interpretability of self-driving systems. Put simply, the goal is to have an autonomous vehicle not only capable of driving itself, but also able to explain why it's making the choices it is.

How DriveGPT4 Works

  • Seeing: DriveGPT4 takes video footage from the car’s cameras, breaks it down, and then converts it into the same kind of text-based data that its language model…

--

--

Isaac Kargar

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/