Advances in Object Detection and Segmentation

The field of object detection and segmentation is moving towards more innovative and advanced techniques, with a focus on addressing challenges such as feature confusion, complex user queries, and weakly supervised localization. Researchers are exploring new architectures and approaches, including vision-language models, zero-shot learning, and attention mechanisms, to improve performance and accuracy. Notable developments include the use of cross-domain few-shot object detection transformers, language-instructed segmentation assistants, and attribute prompting for arbitrary referring segmentation.

Some noteworthy papers include: CDFormer, which tackles feature confusion through object-background distinguishing and object-object distinguishing modules. LISAT, a vision-language model designed to describe complex remote-sensing scenes and segment objects of interest, outperforming existing geospatial foundation models. RESAnything, an open-vocabulary and zero-shot method for arbitrary referring expression segmentation, leveraging Chain-of-Thoughts reasoning and attribute prompting. Pro2SAM, a network that leverages the capability of zero-shot generalization and fine-grained segmentation in Segment Anything Model to boost the activation of integral object regions. Split Matching, a novel assignment strategy that decouples Hungarian matching into two components for seen and unseen classes, achieving state-of-the-art performance on standard benchmarks.

Sources

CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion

LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery

RESAnything: Attribute Prompting for Arbitrary Referring Segmentation

From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection

Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization

Split Matching for Inductive Zero-shot Semantic Segmentation

Built with on top of