Advances in Large Language Model Efficiency and Adaptability

The field of large language models (LLMs) is moving towards improving efficiency and adaptability. Researchers are exploring novel approaches to enhance knowledge distillation, mitigate catastrophic forgetting, and develop more effective fine-tuning methods. Notable advancements include the use of curriculum learning frameworks, adaptive transformer block expansion, and null-space constraints to improve model performance and reduce computational overhead. Additionally, studies are investigating the emergence of abstract thought in LLMs, language-agnostic parameter spaces, and the development of universal tokenizers for multilingual models. Overall, the field is progressing towards more efficient, adaptable, and effective LLMs. Noteworthy papers include: Being Strong Progressively, which proposes a novel curriculum learning framework for knowledge distillation; Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning, which introduces a dynamic trainable-block allocation strategy; and One Tokenizer To Rule Them All, which presents a universal tokenizer for multilingual models.

Sources

Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework

Cross-lingual Collapse: How Language-Centric Foundation Models Shape Reasoning in Large Language Models

Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning

Dealing with the Evil Twins: Improving Random Augmentation by Addressing Catastrophic Forgetting of Diverse Augmentations

Low-resource domain adaptation while minimizing energy and hardware resource consumption

Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting

Is Fine-Tuning an Effective Solution? Reassessing Knowledge Editing for Unstructured Data

The Emergence of Abstract Thought in Large Language Models Beyond Any Language

Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages

One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers

Mitigating Negative Interference in Multilingual Sequential Knowledge Editing through Null-Space Constraints

Slimming Down LLMs Without Losing Their Minds

Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training