The field of 3D vision and spatial reasoning is rapidly advancing, driven by innovations in deep learning and computer vision. Recent developments have focused on improving the ability of models to understand and reason about 3D spaces, with applications in areas such as robotics, autonomous vehicles, and virtual reality. A key direction in this field is the development of models that can learn to think in space and time, enabling them to better understand and navigate complex environments. Another important trend is the integration of multimodal learning, where models are trained on multiple sources of data, such as images, videos, and text, to improve their ability to reason and understand the world. Notable papers in this area include SPIDER, which introduces a universal feature matching framework for robust calibration, and C3Po, which presents a new dataset and model for cross-view cross-modality correspondence. The Disc3D pipeline has also shown promising results in generating high-quality 3D dialogue data, while LAST has demonstrated the effectiveness of learning to think in space and time for generalist vision-language models. MapFormer has introduced a new architecture for learning cognitive maps, and Ref-SAM3D has extended SAM3D for text-guided 3D reconstruction. Other noteworthy papers include VLM^2, LocateAnything3D, and G^2VLM, which have all made significant contributions to the field of 3D vision and spatial reasoning.