Advances in Security and Evaluation of Large Language Models

The field of large language models (LLMs) is rapidly evolving, with a growing focus on security and evaluation. Recent studies have highlighted the vulnerability of LLMs to various types of attacks, including jailbreak attacks, poisoning attacks, and leakage attacks. In response, researchers have proposed novel approaches to improve the security and robustness of LLMs, such as the use of explainable AI, neuron relearning, and targeted noise injection. Another area of research is the development of benchmarks and evaluation methodologies for LLMs, including the creation of datasets and platforms for testing and comparing the performance of different models. Notable papers in this area include those proposing the Graph of Attacks framework, the DREAM approach, and the QuantBench platform. Overall, the field is moving towards the development of more secure, robust, and transparent LLMs, with a focus on addressing the challenges and risks associated with their deployment in real-world applications. Noteworthy papers include Unsupervised Corpus Poisoning Attacks, which proposes a novel method for corpus poisoning attacks, and RAG LLMs are Not Safer, which highlights the safety risks associated with retrieval-augmented generation frameworks.

Sources

Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval

SMARTFinRAG: Interactive Modularized Financial RAG Benchmark

RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization

QuantBench: Benchmarking AI Methods for Quantitative Investment

A Gradient-Optimized TSK Fuzzy Framework for Explainable Phishing Detection

Latent Adversarial Training Improves the Representation of Refusal

Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs

Unified Multi-Task Learning & Model Fusion for Efficient Language Model Guardrailing

A model and package for German ColBERT

Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets

Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems

Enhancing Leakage Attacks on Searchable Symmetric Encryption Using LLM-Based Synthetic Data Generation

Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression

Information Retrieval in the Age of Generative AI: The RGB Model

Prefill-Based Jailbreak: A Novel Approach of Bypassing LLM Safety Boundary

NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models

Traceback of Poisoning Attacks to Retrieval-Augmented Generation

Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs

XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs

Improving Phishing Email Detection Performance of Small Large Language Models

EnronQA: Towards Personalized RAG over Private Documents