Advancements in Autonomous Driving with Vision-Language Models

The field of autonomous driving is witnessing significant advancements with the integration of vision-language models (VLMs). These models are being enhanced with reasoning and self-reflection capabilities, leading to improved interpretability and coherence of the decision-making process. The use of chain-of-thought processing and reinforcement learning is becoming increasingly popular, allowing for more effective and reliable trajectory planning. Furthermore, the incorporation of knowledge-enhanced prediction and thought-centric preference optimization is enabling more accurate and trustworthy autonomous driving systems.

Noteworthy papers include: AutoDrive-R$^2$, which proposes a novel VLA framework that enhances reasoning and self-reflection capabilities of autonomous driving systems. KEPT, which introduces a knowledge-enhanced VLM framework that predicts ego trajectories directly from consecutive front-view driving frames, achieving state-of-the-art performance on the nuScenes dataset. TCPO, which proposes a stepwise preference-based optimization approach for effective embodied decision-making, demonstrating an average success rate of 26.67% in the ALFWorld environment.

Sources

AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving

2nd Place Solution for CVPR2024 E2E Challenge: End-to-End Autonomous Driving Using Vision Language Model

KEPT: Knowledge-Enhanced Prediction of Trajectories from Consecutive Driving Frames with Vision-Language Models

TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making

Built with on top of