The field of retrieval-augmented generation (RAG) is moving towards more effective and efficient methods for incorporating external knowledge into large language models (LLMs). Recent research has highlighted the importance of considering the utility of retrieved passages, rather than just their relevance, and has introduced new methods for evaluating and improving RAG systems. One key area of focus is on developing more robust and realistic benchmarks for evaluating RAG systems, particularly in the context of multi-hop questions and out-of-scope queries. Another area of research is on improving the intermediate reasoning capabilities of RAG systems, including the development of new architectures and evaluation metrics. Notable papers in this area include: Evaluating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries, which presents a pipeline for automatic creation of challenging queries. BRIEF-Pro: Universal Context Compression with Short-to-Long Synthesis for Fast and Accurate Multi-Hop Reasoning, which introduces a universal compressor for distilling relevant evidence from retrieved documents. RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems, which proposes a capability-oriented benchmark for fine-grained evaluation of intermediate tasks in agentic RAG workflows. PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering, which introduces an agentic retrieval system that leverages LLMs in a structured loop to retrieve relevant evidence. Stop-RAG: Value-Based Retrieval Control for Iterative RAG, which introduces a value-based controller that adaptively decides when to stop retrieving. PluriHop: Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora, which formalizes pluri-hop questions and proposes a RAG architecture that follows a check all documents individually, filter cheaply approach.