The field of knowledge distillation is moving towards more innovative and effective methods for transferring knowledge from large teacher models to smaller student models. Recent developments have focused on addressing the capacity gap between teacher and student models, with techniques such as residual knowledge decomposition, teacher weight integration, and adaptive temperature scheduling showing promising results. These advancements have led to improved performance in various tasks, including image classification, object detection, and medical image classification. Notably, the use of foundation models and self-knowledge distillation has also been explored, demonstrating potential for efficient and accurate knowledge transfer. Noteworthy papers include: The Expandable Residual Approximation method, which achieves state-of-the-art performance on several benchmarks. The Dual-Model Weight Selection and Self-Knowledge Distillation approach, which overcomes limitations of conventional methods and demonstrates superior performance on medical image classification tasks.