The field of video object segmentation and tracking is rapidly evolving, with a focus on improving the accuracy and efficiency of existing methods. Recent developments have seen the introduction of novel memory mechanisms, such as dynamic smart memory and hierarchical memory architectures, which enable more effective handling of complex object variations and long-term video sequences. Additionally, there has been a shift towards leveraging foundation models, like SAM and SAM2, to enhance generalization capabilities and enable prompt-driven segmentation. Noteworthy papers in this area include: HQ-SMem, which introduces a novel method for high-quality video segmentation and tracking using smart memory, achieving state-of-the-art performance on multiple public datasets. Local2Global query Alignment, which proposes an online framework for video instance segmentation, exhibiting state-of-the-art performance with simple baseline and training purely in online fashion. Shallow Features Matter, which presents a novel hierarchical memory architecture to incorporate both shallow- and high-level features for memory, achieving state-of-the-art performance across all UVOS and video saliency detection benchmarks.