The field of 3D scene representation and reconstruction is rapidly evolving, with a focus on developing more efficient, scalable, and accurate methods. Recent research has explored the use of Gaussian-based methods, neural representations, and self-supervised learning techniques to improve the reconstruction and understanding of dynamic scenes. These advancements have the potential to significantly impact various applications, including robotics, autonomous systems, and computer vision. Notably, the development of more compact and geometrically meaningful 3D representations has shown promise in reducing memory overhead and improving feature fidelity. Additionally, the integration of high-level semantic features and language-based priors has enabled more comprehensive 3D scene understanding and embodied intelligence. The future of 3D scene representations is expected to be shaped by the emergence of foundation models, which could potentially replace current methods as a unified solution for robotic applications. Some noteworthy papers in this area include Flux4D, which introduces a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes, and ShelfGaussian, which proposes an open-vocabulary multi-modal Gaussian-based 3D scene understanding framework supervised by off-the-shelf vision foundation models. Gamma-from-Mono is also notable for its lightweight monocular geometry estimation method that resolves the projective ambiguity in single-camera reconstruction.