Advancements in Multi-Modal Learning and Object Tracking

The field of computer vision is witnessing significant advancements in multi-modal learning and object tracking. Researchers are exploring innovative approaches to leverage multiple sources of information, such as images, text, and skeletons, to improve the accuracy and robustness of various tasks like action recognition, person re-identification, and object tracking. Notably, the development of frameworks that can adaptively fuse different modalities and handle partial or incomplete data is gaining traction. Furthermore, the use of contrastive learning, hierarchical prompt modeling, and domain adaptation techniques is becoming increasingly popular to address the challenges of multi-modal learning. Overall, these advancements are paving the way for more effective and efficient computer vision systems.

Some noteworthy papers in this area include: ViCoKD, which proposes a view-aware cross-modal distillation framework for multi-view action recognition, achieving significant gains on the MultiSensor-Home dataset. PlugTrack, which introduces a novel framework for multi-object tracking that adaptively fuses Kalman filter and data-driven motion predictors, achieving state-of-the-art performance on MOT17/MOT20 and DanceTrack.

Sources

View-aware Cross-modal Distillation for Multi-view Action Recognition

PlugTrack: Multi-Perceptive Motion Analysis for Adaptive Fusion in Multi-Object Tracking

Skeletons Speak Louder than Text: A Motion-Aware Pretraining Paradigm for Video-Based Person Re-Identification

Hierarchical Prompt Learning for Image- and Text-Based Person Re-Identification

DoGCLR: Dominance-Game Contrastive Learning Network for Skeleton-Based Action Recognition

DIR-TIR: Dialog-Iterative Refinement for Text-to-Image Retrieval

CKDA: Cross-modality Knowledge Disentanglement and Alignment for Visible-Infrared Lifelong Person Re-identification

Domain-Shared Learning and Gradual Alignment for Unsupervised Domain Adaptation Visible-Infrared Person Re-Identification

SwiTrack: Tri-State Switch for Cross-Modal Object Tracking

Built with on top of