Efficient 3D Scene Understanding

The field of 3D scene understanding is moving towards more efficient and effective methods for occupancy prediction, object detection, and scene flow estimation. Researchers are exploring novel representations, such as sparse Gaussians, superquadrics, and dynamic queries, to improve the accuracy and speed of these tasks. These new representations enable better capturing of scene geometry and semantics, and are being integrated into various existing models to enhance their performance. Notable papers in this area include S2GO, which achieves state-of-the-art performance on occupancy benchmarks using a streaming sparse Gaussian occupancy prediction method. VoxelSplat is also noteworthy, as it proposes a novel regularization framework that enhances model performance in occupancy and flow prediction. QuadricFormer is another significant contribution, which uses geometrically expressive superquadrics as scene primitives to enable efficient representation of complex structures.

Sources

S2GO: Streaming Sparse Gaussian Occupancy Prediction

VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction

Aerial Multi-View Stereo via Adaptive Depth Range Inference and Normal Cues

ODG: Occupancy Prediction Using Dual Gaussians

3DGeoDet: General-purpose Geometry-aware Image-based 3D Object Detection

DySS: Dynamic Queries and State-Space Learning for Efficient 3D Object Detection from Multi-Camera Videos

LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System

QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction

Built with on top of