The field of 3D scene understanding is moving towards incorporating geometry-aware semantic features to enhance the accuracy and robustness of various tasks such as object localization, pose estimation, and scene graph prediction. Recent research has focused on developing innovative frameworks that combine visual, semantic, and geometric features to achieve state-of-the-art performance. Notably, the integration of geometry-grounding and uncertainty-aware neural feature fields has shown great promise in improving the reliability and generalizability of 3D scene understanding models. Noteworthy papers include: Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields, which investigates the potential benefits of geometry-grounding in distilled fields and proposes a novel framework for inverting radiance fields. Object-Centric Representation Learning for Enhanced 3D Scene Graph Prediction, which demonstrates the importance of object feature quality in determining scene graph accuracy and proposes a highly discriminative object feature encoder. Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction, which achieves state-of-the-art performance in open-vocabulary 3D occupancy prediction using a progressive Gaussian transformer framework.