The field of video understanding and retrieval is rapidly advancing, with a focus on developing more effective and efficient methods for analyzing and retrieving video content. Recent research has explored new approaches to video denoising, sports video analysis, and referring video object segmentation, among other areas. A key trend in this field is the use of innovative architectures and techniques, such as diffusion-based models and transformer-based architectures, to improve the accuracy and robustness of video analysis and retrieval systems. Notable papers in this area include: Denoise-then-Retrieve Network, which introduces a denoise-then-retrieve paradigm for video moment retrieval. TrajSV, a trajectory-based framework for sports video representations and applications, which achieves state-of-the-art performance in sports video retrieval. SAMDWICH, a moment-aware RVOS framework that leverages aligned text-to-clip pairs to guide training and improve referential understanding. Generic Event Boundary Detection via Denoising Diffusion, which introduces a novel diffusion-based boundary detection model that tackles the problem of GEBD from a generative perspective. Bridging the Gap, which designs an approach that transfers singles-trained models to doubles analysis in badminton. Temporal-Conditional Referring Video Object Segmentation, which innovatively integrates existing segmentation methods to effectively enhance boundary segmentation capability. Beyond Simple Edits, which introduces a novel dataset and model for composed video retrieval with dense modifications. Repeating Words for Video-Language Retrieval, which proposes a novel framework to learn fine-grained features for better alignment and introduces an inference pipeline to improve performance without additional training. Aligning Moments in Time using Video Queries, which introduces a transformer-based model designed to capture semantic context and temporal details necessary for precise moment localization.