Advances in 3D Perception and Semantic Segmentation

The field of 3D perception and semantic segmentation is rapidly advancing with a focus on improving accuracy, efficiency, and robustness. Researchers are exploring new architectures and techniques to address challenges such as domain heterogeneity, limited training data, and adverse weather conditions. Notably, the development of novel frameworks and models, such as mixture-of-experts and cross-modal knowledge distillation, is enabling more effective utilization of multi-modal data and improving performance in various applications, including 3D object detection, semantic segmentation, and scene understanding. Noteworthy papers in this area include Point-MoE, which proposes a Mixture-of-Experts architecture for cross-domain generalization in 3D semantic segmentation, and SR3D, which introduces a training-free framework for single-view 3D reconstruction and grasping of transparent and specular objects. Other notable works, such as CroDiNo-KD and BiXFormer, are also making significant contributions to the field by leveraging disentanglement representation, contrastive learning, and modality-agnostic matching to improve performance and robustness.

Sources

Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts

SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping

Revisiting Cross-Modal Knowledge Distillation: A Disentanglement Approach for RGBD Semantic Segmentation

SPPSFormer: High-quality Superpoint-based Transformer for Roof Plane Instance Segmentation from Point Clouds

NUC-Net: Non-uniform Cylindrical Partition Network for Efficient LiDAR Semantic Segmentation

Bi-Manual Joint Camera Calibration and Scene Representation

Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather

BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations

DiagNet: Detecting Objects using Diagonal Constraints on Adjacency Matrix of Graph Neural Network

BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation

FSHNet: Fully Sparse Hybrid Network for 3D Object Detection

FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices

VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection

Built with on top of