Advances in Language Model Distillation and Retrieval

The field of natural language processing is moving towards more efficient and effective methods for distilling knowledge from large language models into smaller, more deployable models. Recent research has focused on developing novel distillation strategies, such as personalized data synthesis and contrastive reasoning self-distillation, to improve the performance of student models. Additionally, there is a growing interest in enhancing retrieval models, including the development of more effective projection variants and the use of multimodal embedding models. Noteworthy papers in this area include: Find Your Optimal Teacher, which proposes a novel synthesis strategy to create personalized data for each student model, and Simple Projection Variants Improve ColBERT Performance, which explores the implications of alternative feedforward linear networks on the performance of ColBERT models. Other notable works include UniME-V2, which leverages the advanced understanding capabilities of MLLMs to enhance representation learning, and Retrofitting Small Multilingual Models for Retrieval, which investigates key factors that influence the effectiveness of multilingual embeddings. Overall, these advances have the potential to significantly improve the performance and efficiency of language models, enabling more widespread deployment and applications.

Sources

Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation

From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance

Simple Projection Variants Improve ColBERT Performance

UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness

Retrofitting Small Multilingual Models for Retrieval: Matching 7B Performance with 300M Parameters

Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking

Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report

Built with on top of