Advancements in Human-LLM Interaction and Agent-Based Modeling

The field of human-LLM interaction and agent-based modeling is rapidly evolving, with a focus on improving the accuracy and adaptability of large language models (LLMs) in various applications. Recent research has emphasized the development of frameworks and benchmarks for evaluating and improving LLM performance, particularly in areas such as error detection, cyber threat investigation, and climate change adaptation. Additionally, there is a growing interest in using LLMs to model complex socio-ecological systems and simulate multiple human perspectives. Notably, researchers are exploring the use of LLMs to design and assess economic policies, simulate human-chatbot dialogues, and enable self-improving agents to learn at test time with human-in-the-loop guidance. Noteworthy papers include:

  • ExCyTIn-Bench, which introduces a benchmark for evaluating LLM agents on cyber threat investigation tasks and provides a comprehensive dataset for training and testing.
  • Configurable multi-agent framework Neo, which automates realistic evaluation of LLM-based systems and has been applied to a production-grade chatbot with promising results.
  • LLM Economist, which presents a novel framework for designing and assessing economic policies in strategic environments with hierarchical decision-making.

Sources

ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations

ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation

Towards an ABM on Proactive Community Adaptation for Climate Change

Configurable multi-agent framework for scalable and realistic testing of llm-based agents

DialogueForge: LLM Simulation of Human-Chatbot Dialogue

LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra

Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance

Simulating multiple human perspectives in socio-ecological systems using large language models

Built with on top of