Advancements in Human-AI Interaction and Large Language Models

The field of human-AI interaction and large language models is rapidly evolving, with a focus on developing more effective evaluation methods and standards for diverse human-AI interaction systems. Researchers are exploring new approaches to assess the personality traits of large language models, moving beyond traditional self-report questionnaires to more robust and context-sensitive evaluations. The development of adaptive human-agent teaming is also a key area of research, with a focus on creating more dynamic and interactional approaches to team formation, task development, and team improvement. Another area of interest is the investigation of large language models' capabilities in relation to human abilities, with studies examining their mathematical reasoning, artistic critique generation, and theory of mind evaluation. Noteworthy papers in this area include: SPHERE, an evaluation card for human-AI systems that provides a framework for more transparent documentation and discussion of evaluation design options. Beyond Self-Reports, a novel multi-observer framework for large language model personality assessment that reduces non-systematic biases and achieves optimal reliability. Assesing LLMs in Art Contexts, a study that explores large language models' performance in writing critiques of artworks and reasoning about mental states in art-related situations, revealing their potential to produce expert-like output with carefully designed prompts.

Sources

SPHERE: An Evaluation Card for Human-AI Systems

Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models

Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents

Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective

Adaptive Human-Agent Teaming: A Review of Empirical Studies from the Process Dynamics Perspective

Can the capability of Large Language Models be described by human ability? A Meta Study

Mathematical Capabilities of Large Language Models in Finnish Matriculation Examination

The Dual Personas of Social Media Bots

Assesing LLMs in Art Contexts: Critique Generation and Theory of Mind Evaluation

Built with on top of