The field of natural language processing is moving towards more complex and realistic question answering tasks, with a focus on multi-hop reasoning and retrieval-augmented generation. Recent developments have led to the creation of new benchmarks and datasets that challenge models to integrate information across multiple sources and generate long-form responses. Additionally, there is a growing interest in improving the efficiency and effectiveness of retrieval-augmented generation models, with techniques such as prompt compression, lossless compression, and context-adaptive synthesis and compression being explored. Noteworthy papers in this area include DocHop-QA, which proposes a large-scale benchmark for multimodal, multi-document, multi-hop question answering, and CORE, which presents a novel method for lossless compression of retrieved documents using reinforcement learning. Other notable papers include SCOPE, which introduces a generative approach for prompt compression, and CASC, which proposes a context-adaptive synthesis and compression framework for enhanced retrieval-augmented generation in complex domains.
Advances in Multi-Hop Question Answering and Retrieval-Augmented Generation
Sources
Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering
Improving End-to-End Training of Retrieval-Augmented Generation Models via Joint Stochastic Approximation
Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index