Advances in Retrieval-Augmented Generation Security

The field of Retrieval-Augmented Generation (RAG) is moving towards addressing security concerns and improving the reliability of large language models (LLMs). Researchers are focusing on developing innovative methods to detect and mitigate attacks, such as corpus poisoning and contamination, that can compromise the integrity of RAG systems. Noteworthy papers in this area include SeCon-RAG, which proposes a two-stage semantic filtering and conflict-free framework for trustworthy RAG, and RIPRAG, which introduces a black-box attack framework that leverages reinforcement learning to optimize the generation of poisoned documents. Additionally, RefusalBench presents a generative methodology for evaluating selective refusal in grounded language models, while SafeRAG-Steering proposes a model-centric embedding intervention to mitigate over-refusals in contaminated RAG pipelines. Other notable works include RAG-Pull, which develops a new class of black-box attack that inserts hidden UTF characters into queries or external code repositories, and ADMIT, which proposes a few-shot knowledge poisoning attack that flips fact-checking decisions and induces deceptive justifications.

Sources

SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG

RIPRAG: Hack a Black-box Retrieval-Augmented Generation Question-Answering System with Reinforcement Learning

RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models

Steering Over-refusals Towards Safety in Retrieval Augmented Generation

RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation

Attacks by Content: Automated Fact-checking is an AI Security Issue

Beating Harmful Stereotypes Through Facts: RAG-based Counter-speech Generation

ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking

Built with on top of