LLMs in Autonomous Driving — Part 2

Isaac Kargar
3 min readFeb 17, 2024

--

Note: AI tools are used as an assistant in this post!

GPT-Driver: Learning to Drive with GPT

GPT-Driver paper proposes a novel approach to motion planning for autonomous vehicles that leverages the power of large language models (LLMs). The approach works by reformulating motion planning as a language modeling problem, where the planner’s inputs and outputs are represented as language tokens. An LLM, in this case GPT-3.5, is then used to generate driving trajectories through a language description of coordinate positions.

The paper also proposes a novel prompting-reasoning-finetuning strategy to stimulate the numerical reasoning potential of the LLM. This strategy enables the LLM to forecast highly precise waypoint coordinates and also articulate its internal decision-making process in natural language.

The authors evaluated their approach on the large-scale nuScenes dataset and found that it outperformed state-of-the-art motion planners in terms of effectiveness, generalization ability, and interoperability.

Overall, this paper presents a promising new approach to motion planning for autonomous vehicles that has the potential to improve the safety and efficiency of self-driving cars.

Here are some of the key findings of the paper:

  • LLMs can be effectively used for motion planning in autonomous vehicles.
  • Reformulating motion planning as a language modeling problem is a novel and effective approach.
  • The prompting-reasoning-finetuning strategy can stimulate the numerical reasoning potential of LLMs.
  • The proposed approach outperforms state-of-the-art motion planners in terms of effectiveness, generalization ability, and interpretability.

Here’s a concise summary of the core ideas behind the “prompting-reasoning-finetuning” strategy:

Problem: Traditional motion planners use different input data types than language models, and they often lack explainability in their decision-making.

Solution: The paper proposes a three-stage approach to bridge this gap and enhance interpretability:

  1. Prompting:
  2. Reasoning:
  3. Fine-tuning:

Benefits:

  • Bridging the Gap: Addresses the data-type mismatch between motion planning and LLMs.
  • Explainability: The chain-of-thought reasoning makes the LLM’s decision-making more transparent and understandable.
  • Human-like Driving: Fine-tuning aligns the model’s output with realistic driving behaviors.

Resources:

We will continue in the next post with more interesting papers.

Thank you for taking the time to read my post. If you found it helpful or enjoyable, please consider giving it a clap and subscribe to my newsletter here.

--

--

Isaac Kargar

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/