Advances in Open-Vocabulary 3D Detection and Semantic Segmentation

The field of computer vision is moving towards more open-vocabulary and semi-supervised approaches, particularly in 3D detection and semantic segmentation. Researchers are exploring new methods to improve the quality of pseudo-labels and feature representations, enabling more accurate and robust models. Notable advancements include the use of diffusion models, vision-language models, and graph pre-training to enhance open-vocabulary 3D detection and semantic segmentation. These innovations have the potential to improve performance in various applications, such as autonomous driving, remote sensing, and medical image analysis. Noteworthy papers include HQ-OV3D, which proposes a framework for generating high-quality pseudo-labels for open-vocabulary 3D detection, and DeCLIP, which enhances vision-language models for open-vocabulary dense perception. VG-DETR is also notable for its semi-supervised framework for source-free object detection in remote sensing images.

Sources

HQ-OV3D: A High Box Quality Open-World 3D Detection Framework based on Diffision Model

VFM-Guided Semi-Supervised Detection Transformer for Source-Free Object Detection in Remote Sensing Images

Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception

Generalize across Homophily and Heterophily: Hybrid Spectral Graph Pre-Training and Prompt Tuning

Leveraging the RETFound foundation model for optic disc segmentation in retinal images

Separating Knowledge and Perception with Procedural Data

Automated Model Evaluation for Object Detection via Prediction Consistency and Reliablity

Splat Feature Solver

C2PSA-Enhanced YOLOv11 Architecture: A Novel Approach for Small Target Detection in Cotton Disease Diagnosis

S5: Scalable Semi-Supervised Semantic Segmentation in Remote Sensing

WP-CLIP: Leveraging CLIP to Predict W\"olfflin's Principles in Visual Art

Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Inter-Class Relational Loss for Small Object Detection: A Case Study on License Plates

Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models

A Curated Dataset and Deep Learning Approach for Minor Dent Detection in Vehicles