Advances in Retrieval-Augmented Generation Security

The field of Retrieval-Augmented Generation (RAG) is moving towards addressing security concerns and improving the reliability of large language models (LLMs). Researchers are focusing on developing innovative methods to detect and mitigate attacks, such as corpus poisoning and contamination, that can compromise the integrity of RAG systems. Noteworthy papers in this area include SeCon-RAG, which proposes a two-stage semantic filtering and conflict-free framework for trustworthy RAG, and RIPRAG, which introduces a black-box attack framework that leverages reinforcement learning to optimize the generation of poisoned documents. Additionally, RefusalBench presents a generative methodology for evaluating selective refusal in grounded language models, while SafeRAG-Steering proposes a model-centric embedding intervention to mitigate over-refusals in contaminated RAG pipelines. Other notable works include RAG-Pull, which develops a new class of black-box attack that inserts hidden UTF characters into queries or external code repositories, and ADMIT, which proposes a few-shot knowledge poisoning attack that flips fact-checking decisions and induces deceptive justifications.

Advances in Retrieval-Augmented Generation Security

Sources