The field of autonomous driving is witnessing significant advancements through the integration of multi-modal perception and explainability techniques. Researchers are actively exploring the potential of vision-language models, large language models, and sensor fusion methods to enhance the robustness and adaptability of autonomous vehicles. A key direction in this area is the development of frameworks that can generate accurate and contextually relevant explanations for complex driving scenarios, enabling drivers to better understand the vehicle's decision-making process. Another important trend is the use of natural language to facilitate communication between vehicles and improve collective perception and decision making. Noteworthy papers in this area include:
- DriveBLIP2, which introduces an attention-guided explanation generation framework for complex driving scenarios,
- Where, What, Why, which proposes a novel task paradigm for explainable driver attention prediction,
- VLAD, which presents a VLM-augmented autonomous driving framework with hierarchical planning and interpretable decision process. These innovative approaches are paving the way for the development of more transparent, trustworthy, and efficient autonomous driving systems.