The field of semantic segmentation is moving towards leveraging reinforcement learning and multimodal approaches to improve performance and efficiency. Researchers are exploring the use of reward-based systems, gaze tracking, and continual learning to enhance the accuracy and robustness of segmentation models. Notably, the integration of fine-grained segmentation settings and task-specific rewards is enabling multimodal large models to perform fine-grained reasoning in image understanding tasks. Furthermore, advances in motion-aware calibration and efficient chain-of-pixel reasoning are addressing the challenges of dynamic conditions and overthinking in mobile gaze tracking and multimodal understanding. Noteworthy papers include:
- RSS, which proposes a practical application of reward-based reinforcement learning on pure semantic segmentation.
- GradTrack, which leverages physicians' gaze tracks to enhance weakly supervised semantic segmentation performance.
- SAM-R1, which enables multimodal large models to perform fine-grained reasoning in image understanding tasks.
- MAC-Gaze, which presents a motion-aware continual calibration approach for mobile gaze tracking.
- PixelThink, which proposes a simple yet effective scheme to regulate reasoning generation within a reinforcement learning paradigm.