Advancements in Large Language Model Security and Robustness

The field of large language models (LLMs) is rapidly advancing, with a focus on improving security and robustness. Recent research has highlighted various vulnerabilities in LLMs, including prompt injection attacks, jailbreak attacks, and causal manipulation. To address these issues, researchers have proposed innovative solutions such as IntentGuard, a defense framework based on instruction-following intent analysis, and Context-Aware Hierarchical Learning (CAHL), a mechanism that dynamically balances semantic comprehension with role-specific instruction constraints. Noteworthy papers include 'Assertion-Conditioned Compliance: A Provenance-Aware Vulnerability in Multi-Turn Tool-Calling Agents' and 'ARCADIA: Scalable Causal Discovery for Corporate Bankruptcy Analysis Using Agentic AI'. These studies showcase the ongoing efforts to develop more secure and reliable LLMs, and demonstrate the need for continued research in this area.

Sources

Assertion-Conditioned Compliance: A Provenance-Aware Vulnerability in Multi-Turn Tool-Calling Agents

ARCADIA: Scalable Causal Discovery for Corporate Bankruptcy Analysis Using Agentic AI

Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models

Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis

Securing Large Language Models (LLMs) from Prompt Injection Attacks

A Wolf in Sheep's Clothing: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search

Invasive Context Engineering to Control Large Language Models

Identifying attributions of causality in political text

Immunity memory-based jailbreak detection: multi-agent adaptive guard for large language models

Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

In-Context Representation Hijacking

Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models

SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security

Are Your Agents Upward Deceivers?