The field of embodied visual navigation is moving towards more intelligent and adaptive decision-making frameworks. Recent developments focus on combining data-driven semantics, Pareto-optimal decision-making, and visual servoing for real-time navigation. Zero-shot learning approaches are being explored to improve long-horizon planning performance, with an emphasis on leveraging frontier information and potential-based exploration. Vision-Language Models (VLMs) are being utilized to guide navigation agents, enabling more informed and goal-relevant decisions. Notable papers include:
- Expand Your SCOPE, which proposes a zero-shot framework that explicitly leverages frontier information to drive potential-based exploration, and
- Think, Remember, Navigate, which outsources high-level planning to a VLM, leveraging its contextual understanding to guide a frontier-based exploration agent.