The field of computer vision is moving towards more efficient and effective methods for object segmentation and perception. Recent developments have focused on improving the accuracy and adaptability of models, particularly in challenging environments. One notable direction is the use of dynamic local priors and mixture-of-experts approaches to enhance fine-grained perception. Another area of research is the integration of global and local features, as well as the use of attention mechanisms, to improve performance in tasks such as semantic segmentation and simultaneous localization and mapping (SLAM).
Noteworthy papers include: Controllable-LPMoE, which proposes a novel dynamic priors-based fine-tuning paradigm for object segmentation tasks. Diffusion-Driven Two-Stage Active Learning, which leverages a pre-trained diffusion model to extract rich multi-scale features for low-budget semantic segmentation. HyPerNav, which uses Vision-Language Models to jointly perceive local and global information for object-oriented navigation in unknown environments. Region-CAM, which generates activation maps by extracting semantic information maps and performing semantic information propagation for weakly supervised learning tasks. Classifier Enhancement Using Extended Context and Domain Experts, which dynamically adjusts the classifier using global and local contextual information for semantic segmentation. Self-localization on a 3D map by fusing global and local features from a monocular camera, which combines CNN with Vision Transformer to extract global features and improve self-localization accuracy.