Efficient Adaptation and Fine-Tuning of Large Language Models

The field of large language models (LLMs) is moving towards more efficient adaptation and fine-tuning techniques. Recent research has focused on developing methods that reduce the computational cost and memory requirements of fine-tuning, while maintaining or improving performance. One direction is the use of low-rank adaptation (LoRA) and mixture-of-experts (MoE) models, which enable scalable performance by activating large parameter sets sparsely. Another area of research is the development of continuous fine-tuning strategies, which mitigate the limitations of existing fine-tuning methods and maintain efficiency in privacy-preserving settings. Additionally, there is a growing interest in task-aware expert merging and online MoE inference, which enable efficient and reliable deployment of LLMs in resource-constrained edge networks. Noteworthy papers in this area include the proposal of TsqLoRA, a novel method that integrates data-quality-driven selection with sensitivity-aware low-rank adaptation, and the development of DEAL, a framework that integrates LoRA with a continuous fine-tuning strategy. The paper on Symphony-MoE also presents a novel two-stage framework for constructing powerful MoE models using experts sourced from multiple disparate pre-trained models.

Sources

Small LLMs with Expert Blocks Are Good Enough for Hyperparamter Tuning

Inference Offloading for Cost-Sensitive Binary Classification at the Edge

On Optimal Steering to Achieve Exact Fairness

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

BEFT: Bias-Efficient Fine-Tuning of Language Models

DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning

Self-Evolving LLMs via Continual Instruction Tuning

LoRALib: A Standardized Benchmark for Evaluating LoRA-MoE Methods

Rank-Induced PL Mirror Descent: A Rank-Faithful Second-Order Algorithm for Sleeping Experts

Optimal Service Mode Assignment in a Simple Computation Offloading System: Extended Version

Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts

TsqLoRA: Towards Sensitivity and Quality Low-Rank Adaptation for Efficient Fine-Tuning

When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models

Data Efficient Adaptation in Large Language Models via Continuous Low-Rank Fine-Tuning

Faster, Smaller, and Smarter: Task-Aware Expert Merging for Online MoE Inference