Advances in Open-Set Segmentation and Robustness

The field of computer vision is moving towards more robust and generalizable models, with a focus on open-set segmentation and reliability. Recent works have proposed innovative approaches to adapt pre-trained models to new tasks and datasets, achieving state-of-the-art results in various benchmarks. Notably, the use of geometric features, contrastive learning, and adaptive augmentation strategies have shown significant improvements in segmentation accuracy and robustness. Furthermore, the development of benchmarking tools and datasets has enabled a more comprehensive evaluation of model performance and reliability. Overall, the field is shifting towards more efficient, scalable, and robust models that can handle complex and diverse datasets. Notable papers in this area include Segment Anyword, which proposes a novel training-free approach for open-set language grounded segmentation, and SemSegBench & DetecBench, which provide benchmarking tools for evaluating the reliability and generalization of semantic segmentation and object detection models.

Sources

Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation

SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification

Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking

REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders

Geometric Feature Prompting of Image Segmentation Models

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective

Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting

Adapting Segment Anything Model for Power Transmission Corridor Hazard Segmentation

A Survey on Training-free Open-Vocabulary Semantic Segmentation

On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation

Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation

TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models