Advances in Autonomous Perception and 3D Scene Understanding

The field of autonomous perception and 3D scene understanding is rapidly advancing, with a focus on developing more efficient and accurate methods for 3D object detection, scene completion, and semantic segmentation. Recent research has explored the use of multimodal fusion, sparse representation, and self-supervised learning to improve perception distances and accuracy. Notable developments include the use of Doppler-guided sparse queries for bandwidth-efficient cooperative 3D perception, cross-modal knowledge distillation for efficient online HD map construction, and the integration of semantic and geometric priors for 3D scene completion. These innovations have the potential to significantly improve the performance and safety of autonomous vehicles and robots.

Noteworthy papers include: Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction, which proposes a novel approach for collaborative 3D semantic occupancy prediction using sparse 3D semantic Gaussian splatting. CMF-IoU: Multi-Stage Cross-Modal Fusion 3D Object Detection with IoU Joint Prediction, which introduces a multi-stage cross-modal fusion framework for 3D object detection that effectively aligns 3D spatial and 2D semantic information. Unleashing Semantic and Geometric Priors for 3D Scene Completion, which proposes a novel framework that performs dual decoupling at both the source and pathway levels to improve 3D scene completion.

Sources

Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction

A CLIP-based Uncertainty Modal Modeling (UMM) Framework for Pedestrian Re-Identification in Autonomous Driving

G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

Deep Learning For Point Cloud Denoising: A Survey

Enhancing 3D point accuracy of laser scanner through multi-stage convolutional neural network for applications in construction

DoppDrive: Doppler-Driven Temporal Aggregation for Improved Radar Object Detection

Cross-Modal Knowledge Distillation with Multi-Level Data Augmentation for Low-Resource Audio-Visual Sound Event Localization and Detection

CMF-IoU: Multi-Stage Cross-Modal Fusion 3D Object Detection with IoU Joint Prediction

SlimComm: Doppler-Guided Sparse Queries for Bandwidth-Efficient Cooperative 3-D Perception

CORENet: Cross-Modal 4D Radar Denoising Network with LiDAR Supervision for Autonomous Driving

Unleashing Semantic and Geometric Priors for 3D Scene Completion

Self-Supervised Sparse Sensor Fusion for Long Range Perception

RCDINO: Enhancing Radar-Camera 3D Object Detection with DINOv2 Semantic Features

MapKD: Unlocking Prior Knowledge with Cross-Modal Distillation for Efficient Online HD Map Construction

Built with on top of