Advances in Secure and Trustworthy Large Language Models

The field of large language models (LLMs) is rapidly evolving, with a growing focus on security, trustworthiness, and reliability. Recent research has highlighted the importance of developing LLMs that can operate in a secure and transparent manner, particularly in high-stakes applications such as finance, healthcare, and industrial automation. One of the key challenges in this area is the need to balance the benefits of LLMs, such as their ability to process and generate human-like language, with the risks associated with their use, such as the potential for bias, misuse, and exploitation. To address these challenges, researchers are exploring a range of innovative solutions, including the development of novel architectures, such as graph-based and attention-based models, and the application of advanced techniques, such as reinforcement learning and mechanistic interpretability. Noteworthy papers in this area include 'ADA: Automated Moving Target Defense for AI Workloads via Ephemeral Infrastructure-Native Rotation in Kubernetes', which introduces a novel approach to securing AI workloads using automated moving target defense, and 'SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems', which presents a system-level anomaly detection framework for multi-agent systems.

Sources

ADA: Automated Moving Target Defense for AI Workloads via Ephemeral Infrastructure-Native Rotation in Kubernetes

System Prompt Extraction Attacks and Defenses in Large Language Models

Scalable, Symbiotic, AI and Non-AI Agent Based Parallel Discrete Event Simulations

Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems

LLM Agents Should Employ Security Principles

SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems

An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring

Beyond the Black Box: Interpretability of LLMs in Finance

So, I climbed to the top of the pyramid of pain -- now what?

STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds

Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges

Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol Ecosystem

Docker under Siege: Securing Containers in the Modern Era

Composable Building Blocks for Controllable and Transparent Interactive AI Systems

Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems

ATAG: AI-Agent Application Threat Assessment with Attack Graphs

Feedstack: Layering Structured Representations over Unstructured Feedback to Scaffold Human AI Conversation

Sampling Preferences Yields Simple Trustworthiness Scores

Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks

From Theory to Practice: Real-World Use Cases on Trustworthy LLM-Driven Process Modeling, Prediction and Automation

Privacy and Security Threat for OpenAI GPTs

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

HADA: Human-AI Agent Decision Alignment Architecture

Demonstrations of Integrity Attacks in Multi-Agent Systems

Agentic AI for Intent-Based Industrial Automation

SECNEURON: Reliable and Flexible Abuse Control in Local LLMs via Hybrid Neuron Encryption

Control Tax: The Price of Keeping AI in Check