Advances in Remote Sensing Object Detection and Vision-Language Modeling

The field of remote sensing is witnessing significant advancements in object detection and vision-language modeling. Researchers are exploring the fusion of optical and SAR images to improve detection accuracy, particularly in complex environments. The development of large-scale, standardized datasets and benchmarking toolkits is facilitating the evaluation and comparison of different methods. Additionally, vision-language models are being applied to remote sensing tasks, such as image-text retrieval and visual question answering, with a focus on learning image and language alignments from large datasets. The use of multi-modal and multi-resolution approaches is also becoming increasingly popular, enabling the extraction of complementary information from different image modalities. Notable papers in this area include:

  • M4-SAR, which introduces a comprehensive dataset for optical-SAR fusion object detection and proposes a novel end-to-end multi-source fusion detection framework.
  • Vision-Language Modeling Meets Remote Sensing, which provides a comprehensive review of vision-language modeling in remote sensing and discusses future research directions.
  • Visual Question Answering on Multiple Remote Sensing Image Modalities, which proposes a new VQA dataset and model for effectively combining multiple image modalities and text.
  • InstructSAM, which introduces a training-free framework for instruction-driven object recognition in remote sensing imagery.

Sources

M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection

Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing

Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives

Visual Question Answering on Multiple Remote Sensing Image Modalities

InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition

Detailed Evaluation of Modern Machine Learning Approaches for Optic Plastics Sorting

Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval

Built with on top of