The field of Large Language Models (LLMs) is moving towards more efficient fine-tuning methods, with a focus on reducing computational costs and improving model performance. Recent developments have centered around Low-Rank Adaptation (LoRA) methods, which have shown promise in adapting LLMs to specific downstream tasks while minimizing parameter updates. Noteworthy papers in this area include:
- The paper proposing EffiLoRA, which employs a unified A matrix across all transformer layers and introduces a runtime selective B matrices update to dynamically trade-off system resource budget and model performance.
- The paper proposing SmartFed, which intelligently reuses knowledge embedded in existing LoRA modules and introduces the Mixture of Rank-Wise Experts (MoRE) to selectively activate and combine experts based on input semantics and resource budgets.