The field of autonomous UAV navigation is rapidly advancing with a focus on embodied AI, which integrates human-robot interaction, 3D spatial reasoning, and real-world deployment. Researchers are exploring the use of large language models (LLMs) and vision-language models (VLMs) to improve the navigation capabilities of unmanned aerial vehicles (UAVs) in complex urban environments. One of the key challenges is to enable UAVs to interpret natural language instructions and navigate through unstructured environments with minimal human supervision. To address this challenge, researchers are developing hierarchical semantic planning modules, global memory modules, and reactive thinking loops to enhance the navigation capabilities of UAVs. Noteworthy papers include CityNavAgent, which proposes a LLM-empowered agent for aerial vision-and-language navigation, and UAV-CodeAgents, which presents a scalable multi-agent framework for autonomous UAV mission generation. Additionally, the CityAVOS benchmark dataset and PRPSearcher method have been introduced for autonomous visual object search in city spaces, and a system for air-ground collaboration for language-specified missions in unknown environments has been developed.