Advancements in 3D Perception and Scene Understanding

The field of 3D perception and scene understanding is rapidly advancing, with a focus on developing efficient and accurate methods for tasks such as point cloud segmentation, 3D object detection, and instance segmentation. Recent research has explored the use of visual foundation models and 2D-centric pipelines to improve performance and reduce computational cost. Notably, the application of pre-trained 2D models to 3D tasks has shown promising results, enabling fast and accurate predictions. Additionally, the development of novel frameworks and architectures has led to state-of-the-art performance on various benchmarks. Some notable papers in this area include: RangeSAM, which leverages visual foundation models for range-view represented LiDAR segmentation, achieving competitive performance on SemanticKITTI. Sparse Multiview Open-Vocabulary 3D Detection, which establishes a powerful baseline for open-vocabulary 3D object detection in sparse-view settings. SegDINO3D, which achieves state-of-the-art performance on 3D instance segmentation benchmarks by leveraging both image-level and object-level 2D features.

Sources

RangeSAM: Leveraging Visual Foundation Models for Range-View repesented LiDAR segmentation

Sparse Multiview Open-Vocabulary 3D Detection

SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D Features

Category-Level Object Shape and Pose Estimation in Less Than a Millisecond

VIMD: Monocular Visual-Inertial Motion and Depth Estimation

Geometric Interpretation of 3-SAT and Phase Transition

DB-TSDF: Directional Bitmask-based Truncated Signed Distance Fields for Efficient Volumetric Mapping

Built with on top of