Advances in Retrieval-Augmented Generation

The field of retrieval-augmented generation (RAG) is moving towards improving the efficiency and effectiveness of large language models (LLMs) by optimizing context retrieval and compression. Researchers are exploring innovative approaches to address the challenges of retrieving relevant context, managing context size, and reducing latency. Notable advancements include the development of dynamic context optimization mechanisms, novel frameworks that combine backward and forward lookup, and attention-based understanding tasks for context compression. These advancements have the potential to significantly improve the performance of LLMs and RAG systems. Noteworthy papers include: FB-RAG, which presents a novel framework that enhances the RAG pipeline by combining backward and forward lookup to retrieve specific context chunks, and QwenLong-CPRS, which introduces a context compression framework that enables multi-granularity context compression guided by natural language instructions. Sentinel is also notable for its lightweight sentence-level compression framework that reframes context filtering as an attention-based understanding task. Lastly, Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs proposes a methodology that reduces training data requirements for hallucination detection frameworks.

Sources

FB-RAG: Improving RAG with Forward and Backward Lookup

QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization

Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis

DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers

Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective

Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs

Built with on top of