Advances in Knowledge Distillation and Vision Transformers

The field of computer vision is moving towards more efficient and effective knowledge transfer between models, with a focus on vision transformers and knowledge distillation. Recent developments have shown that fine-tuning pre-trained vision transformers using mutual information-aware optimization can lead to more effective knowledge transfer, enabling small student models to benefit from strong pre-trained models. Additionally, novel distillation strategies such as competitive distillation and meta-token learning have been proposed to improve visual classification and audio-visual adaptation. These approaches have demonstrated significant reductions in memory usage and training time while maintaining competitive accuracy. Notable papers include ReMem, which proposes a mutual information-aware fine-tuning method for pre-trained vision transformers, and Mettle, which introduces a simple and memory-efficient method for adapting large-scale pre-trained transformer models to downstream audio-visual tasks. FADRM is also noteworthy for its fast and accurate data residual matching approach for dataset distillation, achieving state-of-the-art performance on multiple benchmarks.

Sources

ReMem: Mutual Information-Aware Fine-tuning of Pretrained Vision Transformers for Effective Knowledge Distillation

Mettle: Meta-Token Learning for Memory-Efficient Audio-Visual Adaptation

Competitive Distillation: A Simple Learning Strategy for Improving Visual Classification

FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation

Towards Undistillable Models by Minimizing Conditional Mutual Information

Learning an Ensemble Token from Task-driven Priors in Facial Analysis

Built with on top of