The field of human-LLM interaction and agent-based modeling is rapidly evolving, with a focus on improving the accuracy and adaptability of large language models (LLMs) in various applications. Recent research has emphasized the development of frameworks and benchmarks for evaluating and improving LLM performance, particularly in areas such as error detection, cyber threat investigation, and climate change adaptation. Additionally, there is a growing interest in using LLMs to model complex socio-ecological systems and simulate multiple human perspectives. Notably, researchers are exploring the use of LLMs to design and assess economic policies, simulate human-chatbot dialogues, and enable self-improving agents to learn at test time with human-in-the-loop guidance. Noteworthy papers include:
- ExCyTIn-Bench, which introduces a benchmark for evaluating LLM agents on cyber threat investigation tasks and provides a comprehensive dataset for training and testing.
- Configurable multi-agent framework Neo, which automates realistic evaluation of LLM-based systems and has been applied to a production-grade chatbot with promising results.
- LLM Economist, which presents a novel framework for designing and assessing economic policies in strategic environments with hierarchical decision-making.