Advances in 3D Vision and Object Perception

The field of computer vision is moving towards more accurate and robust 3D object perception and scene understanding. Recent developments have focused on incorporating temporal dynamics and canonical representations to improve the segmentation and detection of articulated objects. Additionally, there is a growing interest in enhancing monocular 3D object detection and semantic scene completion for autonomous driving applications. Novel approaches are being proposed to address the challenges of occlusion, limited visibility, and geometric ambiguity in these tasks. Noteworthy papers include: MonoCLUE, which leverages object-aware clustering to enhance monocular 3D object detection, and EAGLE, which utilizes episodic appearance- and geometry-aware memory for unified 2D-3D visual query localization in egocentric vision. HD$^2$-SSC is also a notable work, proposing a high-dimension high-density semantic scene completion framework to bridge the dimension and density gaps in existing SSC methods. GECO2 is another significant contribution, introducing a generalized-scale object counting method with gradual query aggregation to address object scale issues in few-shot detection-based counters. Lastly, the introduction of Shadow-informed Pose Feature and Rotation-invariant Attention Convolution has substantially improved rotation-invariant 3D learning by preserving global pose awareness and enhancing spatial discrimination.

Sources

Canonical Space Representation for 4D Panoptic Segmentation of Articulated Objects

MonoCLUE : Object-Aware Clustering Enhances Monocular 3D Object Detection

HD$^2$-SSC: High-Dimension High-Density Semantic Scene Completion for Autonomous Driving

EAGLE: Episodic Appearance- and Geometry-aware Memory for Unified 2D-3D Visual Query Localization in Egocentric Vision

Generalized-Scale Object Counting with Gradual Query Aggregation

Enhancing Rotation-Invariant 3D Learning with Global Pose Awareness and Attention Mechanisms

Built with on top of