The field of large language models (LLMs) is moving towards improving efficiency and adaptability. Researchers are exploring novel approaches to enhance knowledge distillation, mitigate catastrophic forgetting, and develop more effective fine-tuning methods. Notable advancements include the use of curriculum learning frameworks, adaptive transformer block expansion, and null-space constraints to improve model performance and reduce computational overhead. Additionally, studies are investigating the emergence of abstract thought in LLMs, language-agnostic parameter spaces, and the development of universal tokenizers for multilingual models. Overall, the field is progressing towards more efficient, adaptable, and effective LLMs. Noteworthy papers include: Being Strong Progressively, which proposes a novel curriculum learning framework for knowledge distillation; Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning, which introduces a dynamic trainable-block allocation strategy; and One Tokenizer To Rule Them All, which presents a universal tokenizer for multilingual models.
Advances in Large Language Model Efficiency and Adaptability
Sources
Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework
Cross-lingual Collapse: How Language-Centric Foundation Models Shape Reasoning in Large Language Models
Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning
Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations
Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages