The field of object tracking and detection is rapidly advancing with a focus on developing more accurate and efficient methods. Recent developments have seen a shift towards utilizing multi-view camera systems, leveraging cross-view information to aggregate spatio-temporal information and enable more complete and reliable trajectory estimation. Additionally, there is a growing interest in using video diffusion models to learn motion representations suitable for tracking without task-specific training. Another area of research is exploring weakly-supervised and calibration-free methods for crowd counting, eliminating the need for expensive image-level crowd annotations and camera calibrations. Noteworthy papers include TransientTrack, which presents a deep learning-based framework for cell tracking in multi-channel microscopy video data, and Video Diffusion Models Excel at Tracking Similar-Looking Objects Without Supervision, which demonstrates the effectiveness of pre-trained video diffusion models in tracking visually similar objects. Other notable papers are MV-TAP, which proposes a novel point tracker for multi-view videos, and FDTA, which introduces an explicit feature refinement framework to enhance object discriminativeness for multi-object tracking.