Advances in Multimodal Perception and Fusion for Autonomous Systems

The field of autonomous systems is rapidly advancing, with a strong focus on developing robust and reliable multimodal perception and fusion techniques. Recent research has explored the integration of various sensors, such as radar, lidar, cameras, and GPS, to improve the accuracy and robustness of object detection, tracking, and scene understanding. Notably, the use of cooperative perception frameworks, which enable the sharing of sensor data between multiple vehicles, has shown significant promise in enhancing detection robustness and accuracy. Furthermore, the development of novel fusion methods, such as attentive depth-based blending schemes and graph-based uncertainty modeling, has improved the ability to combine multimodal data and extract meaningful information. These advances have far-reaching implications for various applications, including autonomous driving, robotics, and surveillance.

Some noteworthy papers in this area include SAMFusion, which introduces a novel multi-sensor fusion approach tailored to adverse weather conditions, and CoVeRaP, which establishes a reproducible benchmark for multi-vehicle FMCW-radar perception. Additionally, OpenM3D presents a novel open-vocabulary multi-view indoor 3D object detector trained without human annotations, demonstrating superior accuracy and speed on indoor benchmarks.

Sources

CoVeRaP: Cooperative Vehicular Perception through mmWave FMCW Radars

SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather

Towards Open World Detection: A Survey

Towards High-Precision Depth Sensing via Monocular-Aided iToF and RGB Integration

AWM-Fuse: Multi-Modality Image Fusion for Adverse Weather via Global and Local Text Perception

Multi-Agent Visual-Language Reasoning for Comprehensive Highway Scene Understanding

CubeDN: Real-time Drone Detection in 3D Space from Dual mmWave Radar Cubes

DroneKey: Drone 3D Pose Estimation in Image Sequences using Gated Key-representation and Pose-adaptive Learning

BirdRecorder's AI on Sky: Safeguarding birds of prey by detection and classification of tiny objects around wind turbines

Infrastructure-enabled risk assessment of hazardous road conditions on rural roads during inclement weather

UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models

Generalizing Monocular 3D Object Detection

Divide, Weight, and Route: Difficulty-Aware Optimization with Dynamic Expert Fusion for Long-tailed Recognition

OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations

Graph-Based Uncertainty Modeling and Multimodal Fusion for Salient Object Detection

Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts

FusionCounting: Robust visible-infrared image fusion guided by crowd counting via multi-task learning