LLMs in Autonomous Driving — Part 5: LLaDA by NVIDIA DRIVE Labs

Isaac Kargar
4 min readJun 16, 2024

In this blog post, I want to review one of the works from NVIDIA DRIVE Labs and a published paper called “Driving Everywhere with Large Language Model Policy Adaptation” by them in CVPR 2024. They introduced a Large Language Driving Assistant (LLaDA) in this paper which is a simple but powerful tool that enables human drivers and autonomous vehicles to drive everywhere by adapting their tasks and motion plans to traffic rules in those new locations. LLaDA achieves this by leveraging the impressive zero-shot generalizability of large language models (LLMs) in interpreting the traffic rules in the local driver handbook.

One of the main bottlenecks of using current deep learning-based solutions is their generalization limits, especially in autonomous driving where changing the environment can change many things like the rules, input data, etc. As this field is very safety-critical, mitigating this limitation would help a lot to push the field forward. LLaDA tackles this problem by using large language models (LLMs) to make driving policies (for both human drivers and autonomous vehicles) more adaptable to local traffic rules and unexpected situations in different regions. This approach doesn’t require additional training and helps human drivers and autonomous vehicles adjust to new environments. LLaDA can be easily integrated into any autonomous driving system to enhance its performance in new locations with different traffic laws.

LLaDA takes in four types of information, all in plain language: (i) a basic driving plan, (ii) the traffic rules for the current location, (iii) a description of the current scene from the driver’s perspective, and (iv) a description of any unexpected situation happening. Using these inputs, LLaDA creates a motion plan — also in plain language — that follows the local traffic rules. The basic driving plan can be created by a human driver, and both the scene description and the unexpected situation description can be provided by either a human or a vision-language model (VLM). The traffic rules are taken from the local driver handbook. Normally, if there’s nothing unusual happening, the unexpected situation input is just “normal status.” However, if something unusual happens, like the vehicle being honked at or seeing an animal on the road, a description of this scenario can be given to LLaDA.

To make the role of LLaDA more concrete, consider an example: An AV is operating in New York City (NYC) and the nominal motion plan for the vehicle is to turn right at a signalized intersection with a red light. The AV was honked at by cross traffic which is unexpected. LLaDA will take these inputs along with NYC’s driver manual and adapt the motion plan to no right turn on a red light because NYC traffic law prohibits right turns on red

source

The method consists of several steps:

  • Initial Policy Generation: An initial executable policy is generated using existing methods.
  • Traffic Rule Extraction: When faced with an unexpected driving situation, LLaDA uses a Traffic Rule Extractor (TRE) to identify and extract relevant traffic rules from the local traffic code.
source
  • Plan Adaptation: The TRE’s output and the initial plan are fed into a pre-trained LLM (GPT-4 in this paper), which adapts the plan to align with local traffic rules.
  • User Interaction: Human drivers or AV systems can describe the current scene and unexpected scenarios in natural language. LLaDA processes these inputs to generate an appropriate motion plan that adheres to local traffic regulations.

Applications of LLaDA

1. Traffic Rule Assistance for Tourists: LLaDA can help human drivers understand and follow local traffic rules when driving in unfamiliar areas. Drivers can describe unexpected scenarios in natural language, and LLaDA will provide updated driving instructions based on local traffic laws.

2. AV Motion Plan Adaptation: LLaDA can be integrated into autonomous vehicle (AV) systems to adapt their motion plans to the traffic rules of new geographical locations. This enables AVs to operate more safely and effectively in areas with different driving norms by providing country-specific guidelines and adapting the motion plans accordingly. The following figure shows how this is possible:

source

The solution is tested on the nuScenes dataset, showing improvements in motion planning under novel scenarios compared to baseline approaches.

source

The following figures also show that LLaDA works for random videos under diverse scenarios, achieving driving everywhere with language policy.

source
source

You can check more details in the following references and learn more if you are interested.

--

--

Isaac Kargar

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/