The field of vision-and-language navigation is moving towards more effective integration of external knowledge and improved understanding of visual scenes. Researchers are exploring ways to incorporate common-sense reasoning and spatial awareness into navigation systems, enabling agents to better understand and follow natural language instructions in complex environments. Notable advancements include the development of methods that disentangle foreground and background information, and the use of spatiotemporal knowledge graphs to improve scene understanding and navigation goal identification. Noteworthy papers include: VL-KnG, which presents a visual scene understanding system that tackles fundamental limitations of vision-language models using spatiotemporal knowledge graph construction. Landmark-Guided Knowledge, which proposes a method that introduces an external knowledge base to assist navigation, addressing misjudgment issues caused by insufficient common sense in traditional methods.