The field of endoscopic perception and navigation is rapidly advancing, with a focus on developing more accurate and efficient methods for scene understanding and reconstruction. Recent developments have highlighted the importance of integrating multiple cues and modalities, such as optical flow, appearance flow, and intrinsic image decomposition, to improve the robustness of depth estimation and camera pose tracking in endoscopic scenes. Notable advancements include the proposal of novel frameworks that combine multiple techniques, such as end-to-end multi-step self-supervised training and cue-aware monocular depth estimation, to achieve state-of-the-art performance on benchmark datasets. Additionally, new methods have been proposed to address challenges such as tissue deformation recovery, photometric inconsistencies, and dynamic motion in endoscopic scenes. Some noteworthy papers include: EndoMUST, which proposes a novel framework with multistep efficient finetuning for self-supervised depth estimation, achieving state-of-the-art performance on the SCARED dataset. EndoFlow-SLAM, which introduces optical flow loss as a geometric constraint to improve the performance of SLAM systems in endoscopic scenes, demonstrating high performance in both static and dynamic surgical scenes.