Advances in Autonomous Driving through Multi-Modal Perception and Explainability

The field of autonomous driving is witnessing significant advancements through the integration of multi-modal perception and explainability techniques. Researchers are actively exploring the potential of vision-language models, large language models, and sensor fusion methods to enhance the robustness and adaptability of autonomous vehicles. A key direction in this area is the development of frameworks that can generate accurate and contextually relevant explanations for complex driving scenarios, enabling drivers to better understand the vehicle's decision-making process. Another important trend is the use of natural language to facilitate communication between vehicles and improve collective perception and decision making. Noteworthy papers in this area include:

DriveBLIP2, which introduces an attention-guided explanation generation framework for complex driving scenarios,
Where, What, Why, which proposes a novel task paradigm for explainable driver attention prediction,
VLAD, which presents a VLM-augmented autonomous driving framework with hierarchical planning and interpretable decision process. These innovative approaches are paving the way for the development of more transparent, trustworthy, and efficient autonomous driving systems.

Advances in Autonomous Driving through Multi-Modal Perception and Explainability

Sources