Advances in Knowledge Distillation and Vision Transformers

The field of computer vision is moving towards more efficient and effective knowledge transfer between models, with a focus on vision transformers and knowledge distillation. Recent developments have shown that fine-tuning pre-trained vision transformers using mutual information-aware optimization can lead to more effective knowledge transfer, enabling small student models to benefit from strong pre-trained models. Additionally, novel distillation strategies such as competitive distillation and meta-token learning have been proposed to improve visual classification and audio-visual adaptation. These approaches have demonstrated significant reductions in memory usage and training time while maintaining competitive accuracy. Notable papers include ReMem, which proposes a mutual information-aware fine-tuning method for pre-trained vision transformers, and Mettle, which introduces a simple and memory-efficient method for adapting large-scale pre-trained transformer models to downstream audio-visual tasks. FADRM is also noteworthy for its fast and accurate data residual matching approach for dataset distillation, achieving state-of-the-art performance on multiple benchmarks.

Advances in Knowledge Distillation and Vision Transformers

Sources