The field of computer vision is witnessing significant advancements in visual geometry and tracking, with a focus on developing robust and efficient methods for various applications. Researchers are exploring new approaches to improve the accuracy and generalizability of models, including the use of foundation models, transformers, and causal attention mechanisms. Notably, online long-term point tracking, object re-identification, and aerial object detection are being addressed through innovative solutions that leverage advanced network structures and adaptive query processing. Additionally, there is a growing interest in developing methods that can handle real-world transformations, such as rotation and illumination changes, to enhance the robustness of perception models. Furthermore, researchers are working on scalable and interactive 4D vision systems that can facilitate real-time applications. Some particularly noteworthy papers in this area include: TRACER, which achieves efficient object re-identification through adaptive query processing, and SpatialTrackerV2, which presents a feed-forward 3D point tracking method for monocular videos. These advancements have the potential to significantly impact various fields, from robotics and augmented reality to healthcare and surveillance.