Model Merging in Large Language Models

The field of large language models is moving towards developing more efficient and effective methods for combining multiple models to achieve improved performance. Model merging techniques are being explored as a means to create models with tunable reasoning capabilities, allowing for a balance between reasoning depth and computational cost. This approach has shown promise in creating a spectrum of models with fine-grained control over reasoning abilities, enabling the creation of models with specific reasoning profiles to meet diverse application demands. Notably, innovative methods such as Expert Merging have been introduced, which learn a small set of layer-wise coefficients using only unlabeled calibration data to align the merged model's hidden states and logits with those of the corresponding experts. Noteworthy papers include: The Thinking Spectrum: An Emperical Study of Tunable Reasoning in LLMs through Model Merging, which presents a large-scale empirical study evaluating a range of model merging techniques across multiple reasoning benchmarks. Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking, which introduces a training-light method that learns a small set of layer-wise coefficients using only unlabeled calibration data.

Sources

Effect of Model Merging in Domain-Specific Ad-hoc Retrieval

The Thinking Spectrum: An Emperical Study of Tunable Reasoning in LLMs through Model Merging

Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking

Built with on top of