Advancements in Large Language Model Representation and Routing

The field of large language models (LLMs) is rapidly evolving, with a growing focus on developing efficient and scalable methods for representing and routing these models. Recent research has explored innovative approaches to addressing the challenges of selecting the best-performing LLM for a given task, including the development of training-free methods for representing LLMs as linear operators within the prompts' semantic task space. Another key area of research is the design of collaborative device-cloud LLM inference frameworks, which aim to combine the efficiency of lightweight on-device inference with the superior performance of powerful cloud LLMs. Noteworthy papers in this area include Representing LLMs in Prompt Semantic Task Space, which presents an efficient and interpretable approach to representing LLMs, and Collaborative Device-Cloud LLM Inference through Reinforcement Learning, which proposes a framework for making routing decisions at the end of the on-device LLM's solving process. Additionally, papers such as RADAR and LLMRank have made significant contributions to the development of reasoning-ability and difficulty-aware routing frameworks, and prompt-aware routing frameworks, respectively.

Advancements in Large Language Model Representation and Routing

Sources