Advances in Multimodal Fusion and Perception

The field of multimodal fusion and perception is rapidly advancing, with a focus on developing innovative methods to integrate and process data from diverse sensors and sources. A key direction in this area is the development of robust and efficient fusion techniques that can handle challenging conditions such as noise, missing data, and varying sensor reliability. Recent works have proposed novel architectures and frameworks that leverage deep learning, diffusion models, and other advanced techniques to improve the accuracy and robustness of multimodal perception systems. Notable papers in this area include the Generative Diffusion Contrastive Network for multi-view clustering, DGFusion for depth-guided sensor fusion, and MSGFusion for multimodal scene graph-guided image fusion. These papers demonstrate state-of-the-art performance in various tasks such as clustering, segmentation, and fusion, and highlight the potential of multimodal fusion and perception techniques in applications such as autonomous driving, robotics, and computer vision. Other noteworthy papers include TUNI for real-time RGB-T semantic segmentation, CaR1 for camera-radar fusion, and 4DRadar-GS for self-supervised dynamic driving scene reconstruction. Overall, the field of multimodal fusion and perception is experiencing significant growth and innovation, with a focus on developing practical and effective solutions for real-world applications.

Sources

Generative Diffusion Contrastive Network for Multi-View Clustering

DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception

TUNI: Real-time RGB-T Semantic Segmentation with Unified Multi-Modal Feature Extraction and Cross-Modal Feature Fusion

Ruggedized Ultrasound Sensing in Harsh Conditions: eRTIS in the wild

CaR1: A Multi-Modal Baseline for BEV Vehicle Segmentation via Camera-Radar Fusion

Multimodal SAM-adapter for Semantic Segmentation

Towards Foundational Models for Single-Chip Radar

RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation

MSGFusion: Multimodal Scene Graph-Guided Infrared and Visible Image Fusion

4DRadar-GS: Self-Supervised Dynamic Driving Scene Reconstruction with 4D Radar

MSDNet: Efficient 4D Radar Super-Resolution via Multi-Stage Distillation

TRUST-FS: Tensorized Reliable Unsupervised Multi-View Feature Selection for Incomplete Data

A Software-Defined Radio Testbed for Distributed LiDAR Point Cloud Sharing with IEEE 802.11p in V2V Networks

Adaptive and Iterative Point Cloud Denoising with Score-Based Diffusion Model

One-step Multi-view Clustering With Adaptive Low-rank Anchor-graph Learning

Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model

Built with on top of