Efficient and Effective Large Language Models

The field of large language models (LLMs) is moving towards more efficient and effective architectures, with a focus on improving decoding processes, reducing computational costs, and enhancing ranking and retrieval capabilities. Recent studies have explored the potential of decoder-only models, adaptive blockwise search strategies, and lightweight ranking frameworks to achieve state-of-the-art results while minimizing computational overhead. Additionally, there is a growing interest in developing more flexible and extensible LLM serving systems that can accommodate increasingly complex applications. Noteworthy papers in this area include: Language Ranker, which introduces a novel framework for reranking candidate responses using features extracted by the base model, achieving performance comparable to large-scale reward models with significantly reduced computational overhead. Do Stop Me Now, which proposes a simple yet effective method for detecting boilerplate responses after only a single generation step, enabling early termination or redirection to a smaller model and yielding significant savings in computational cost. E2Rank, which presents a unified framework for text embedding models to perform both high-quality retrieval and listwise reranking, achieving strong effectiveness with remarkable efficiency. AutoDeco, which enables truly end-to-end generation by learning to control its own decoding strategy, allowing the model to self-regulate its sampling strategy within a single forward pass and achieving performance comparable to an oracle-tuned baseline.

Sources

asLLR: LLM based Leads Ranking in Auto Sales

Language Ranker: A Lightweight Ranking framework for LLM Decoding

Do Stop Me Now: Detecting Boilerplate Responses with a Single Iteration

$\text{E}^2\text{Rank}$: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker

Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models

LimRank: Less is More for Reasoning-Intensive Information Reranking

Serve Programs, Not Prompts

Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model

The End of Manual Decoding: Towards Truly End-to-End Language Models

Built with on top of