The field of unmanned aerial vehicle (UAV)-based computer vision is rapidly advancing, with a focus on improving navigation, object detection, and environmental monitoring capabilities. Recent developments have highlighted the potential of hyperspectral imaging (HSI) and multispectral imagery in enhancing object discriminability and scene understanding. Researchers are exploring innovative deep learning architectures, such as bi-directional cross-attention mechanisms and multi-scale context networks, to effectively integrate HSI into UAV perception systems. Additionally, there is a growing emphasis on creating comprehensive benchmarks and datasets, including those tailored for drone-based multispectral multi-object tracking and urban scene understanding. Notable papers in this area include: SpectralCA, which proposes a deep learning architecture for UAV-based HSI perception, achieving state-of-the-art results in navigation and object detection tasks. TCMA, which introduces a text-conditioned multi-granularity alignment framework for drone cross-modal text-video retrieval, demonstrating significant improvements in retrieval performance. MMOT, which presents the first challenging benchmark for drone-based multispectral multi-object tracking, featuring large-scale annotations and comprehensive challenges. MCOP, which develops a novel multi-UAV collaborative occupancy prediction framework, achieving state-of-the-art accuracy while reducing communication overhead. FlyAwareV2, which introduces a multimodal cross-domain UAV dataset for urban scene understanding, providing a valuable resource for research on UAV-based 3D urban scene understanding.