Advances in Image Segmentation

The field of image segmentation is rapidly advancing, with a focus on developing more efficient and effective methods for segmenting images. Recent research has explored the use of text prompts and semantic conditioning to improve segmentation performance, particularly in low-data scenarios. The integration of large language models and multimodal learning has also shown promise in enhancing pixel-level perceptual understanding. Notably, the development of novel frameworks and architectures, such as X-SAM and MLLMSeg, has achieved state-of-the-art performance on various image segmentation benchmarks.

Some noteworthy papers in this area include: SAM-PTx, which introduces a parameter-efficient approach for adapting SAM using frozen CLIP-derived text embeddings as class-level semantic guidance. X-SAM, which presents a streamlined Multimodal Large Language Model framework that extends the segmentation paradigm from segment anything to any segmentation. MLLMSeg, which proposes a novel framework that fully exploits the inherent visual detail features encoded in the MLLM vision encoder without introducing an extra visual encoder.

Sources

SAM-PTx: Text-Guided Fine-Tuning of SAM with Parameter-Efficient, Parallel-Text Adapters

SAMSA 2.0: Prompting Segment Anything with Spectral Angles for Hyperspectral Interactive Medical Image Segmentation

MAUP: Training-free Multi-center Adaptive Uncertainty-aware Prompting for Cross-domain Few-shot Medical Image Segmentation

A Scalable Machine Learning Pipeline for Building Footprint Detection in Historical Maps

Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decode

What Holds Back Open-Vocabulary Segmentation?

Boosting Visual Knowledge-Intensive Training for LVLMs Through Causality-Driven Visual Object Completion

X-SAM: From Segment Anything to Any Segmentation

SMOL-MapSeg: Show Me One Label

Built with on top of