Advances in Secure and Reliable Large Language Models

The field of Large Language Models (LLMs) is moving towards developing more secure and reliable models. Researchers are exploring new methods to prevent prompt injections and ensure the integrity of LLM-powered agents. One notable direction is the use of iterative prompt sanitization loops and formal specifications to validate LLM outputs. Another area of focus is the development of frameworks that can automate the generation of test cases and data stream processing pipelines. The integration of LLMs with Model-Driven Engineering and Domain Specific Languages is also being investigated to improve the reliability and trustworthiness of LLM-based software. Noteworthy papers include: CompressionAttack, which presents a novel attack surface in LLM-powered agents by exploiting prompt compression. PRISM, which unifies LLMs with Model-Driven Engineering to generate regulator-ready artifacts and machine-checkable evidence for safety- and compliance-critical domains.

Sources

Soft Instruction De-escalation Defense

CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents

From Online User Feedback to Requirements: Evaluating Large Language Models for Classification and Specification Tasks

Validating Formal Specifications with LLM-generated Test Cases

AutoStreamPipe: LLM Assisted Automatic Generation of Data Stream Processing Pipelines

A Roadmap for Tamed Interactions with Large Language Models

PRISM: Proof-Carrying Artifact Generation through LLM x MDE Synergy and Stratified Constraints

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

Built with on top of