Embodied AI for Autonomous UAV Navigation

The field of autonomous UAV navigation is rapidly advancing with a focus on embodied AI, which integrates human-robot interaction, 3D spatial reasoning, and real-world deployment. Researchers are exploring the use of large language models (LLMs) and vision-language models (VLMs) to improve the navigation capabilities of unmanned aerial vehicles (UAVs) in complex urban environments. One of the key challenges is to enable UAVs to interpret natural language instructions and navigate through unstructured environments with minimal human supervision. To address this challenge, researchers are developing hierarchical semantic planning modules, global memory modules, and reactive thinking loops to enhance the navigation capabilities of UAVs. Noteworthy papers include CityNavAgent, which proposes a LLM-empowered agent for aerial vision-and-language navigation, and UAV-CodeAgents, which presents a scalable multi-agent framework for autonomous UAV mission generation. Additionally, the CityAVOS benchmark dataset and PRPSearcher method have been introduced for autonomous visual object search in city spaces, and a system for air-ground collaboration for language-specified missions in unknown environments has been developed.

Sources

CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory

UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language Reasoning

Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

Air-Ground Collaboration for Language-Specified Missions in Unknown Environments

Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities

Built with on top of