Advancements in Autonomous Driving with Vision-Language Models

The field of autonomous driving is witnessing significant advancements with the integration of vision-language models (VLMs). These models are being enhanced with reasoning and self-reflection capabilities, leading to improved interpretability and coherence of the decision-making process. The use of chain-of-thought processing and reinforcement learning is becoming increasingly popular, allowing for more effective and reliable trajectory planning. Furthermore, the incorporation of knowledge-enhanced prediction and thought-centric preference optimization is enabling more accurate and trustworthy autonomous driving systems.

Noteworthy papers include: AutoDrive-R$^2$, which proposes a novel VLA framework that enhances reasoning and self-reflection capabilities of autonomous driving systems. KEPT, which introduces a knowledge-enhanced VLM framework that predicts ego trajectories directly from consecutive front-view driving frames, achieving state-of-the-art performance on the nuScenes dataset. TCPO, which proposes a stepwise preference-based optimization approach for effective embodied decision-making, demonstrating an average success rate of 26.67% in the ALFWorld environment.

Advancements in Autonomous Driving with Vision-Language Models

Sources