The field of computer vision is moving towards more open-vocabulary and semi-supervised approaches, particularly in 3D detection and semantic segmentation. Researchers are exploring new methods to improve the quality of pseudo-labels and feature representations, enabling more accurate and robust models. Notable advancements include the use of diffusion models, vision-language models, and graph pre-training to enhance open-vocabulary 3D detection and semantic segmentation. These innovations have the potential to improve performance in various applications, such as autonomous driving, remote sensing, and medical image analysis. Noteworthy papers include HQ-OV3D, which proposes a framework for generating high-quality pseudo-labels for open-vocabulary 3D detection, and DeCLIP, which enhances vision-language models for open-vocabulary dense perception. VG-DETR is also notable for its semi-supervised framework for source-free object detection in remote sensing images.