The field of large language models (LLMs) is rapidly evolving, with a growing focus on security and evaluation. Recent studies have highlighted the vulnerability of LLMs to various types of attacks, including jailbreak attacks, poisoning attacks, and leakage attacks. In response, researchers have proposed novel approaches to improve the security and robustness of LLMs, such as the use of explainable AI, neuron relearning, and targeted noise injection. Another area of research is the development of benchmarks and evaluation methodologies for LLMs, including the creation of datasets and platforms for testing and comparing the performance of different models. Notable papers in this area include those proposing the Graph of Attacks framework, the DREAM approach, and the QuantBench platform. Overall, the field is moving towards the development of more secure, robust, and transparent LLMs, with a focus on addressing the challenges and risks associated with their deployment in real-world applications. Noteworthy papers include Unsupervised Corpus Poisoning Attacks, which proposes a novel method for corpus poisoning attacks, and RAG LLMs are Not Safer, which highlights the safety risks associated with retrieval-augmented generation frameworks.
Advances in Security and Evaluation of Large Language Models
Sources
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization
Enhancing Leakage Attacks on Searchable Symmetric Encryption Using LLM-Based Synthetic Data Generation
Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression