Advancements in Agentic Systems and Large Language Models

The field of agentic systems and large language models is rapidly evolving, with a focus on improving the robustness, personalization, and security of these systems. Researchers are exploring new approaches to benchmarking and evaluating the performance of agentic systems, including the development of novel taxonomies and benchmarks. There is also a growing emphasis on improving the safety and reliability of these systems, with a focus on detecting and preventing harmful behaviors. Additionally, researchers are investigating the use of large language models in a variety of applications, including science and high-performance computing, virtual reality, and mixed reality. Notable papers in this area include: Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms, which proposes a novel benchmark for evaluating the security of agentic systems. PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration, which introduces a framework for personalizing mobile agents using large language models. Reliable Weak-to-Strong Monitoring of LLM Agents, which presents a systematized monitor red teaming workflow for detecting covert misbehavior in autonomous LLM agents. Aegis: Taxonomy and Optimizations for Overcoming Agent-Environment Failures in LLM Agents, which proposes a taxonomy for agent-environment interaction failures and designs targeted environment optimizations to improve agent success rates.

Sources

Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms

PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration

Experiences with Model Context Protocol Servers for Science and High Performance Computing

Portable Silent Room: Exploring VR Design for Anxiety and Emotion Regulation for Neurodivergent Women and Non-Binary Individuals

Reliable Weak-to-Strong Monitoring of LLM Agents

Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents

Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills

Aegis: Taxonomy and Optimizations for Overcoming Agent-Environment Failures in LLM Agents

PersoNo: Personalised Notification Urgency Classifier in Mixed Reality

CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments

Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought

Collaborative Evolution of Intelligent Agents in Large-Scale Microservice Systems

Built with on top of