The field of human-AI interaction and large language models is rapidly evolving, with a focus on developing more effective evaluation methods and standards for diverse human-AI interaction systems. Researchers are exploring new approaches to assess the personality traits of large language models, moving beyond traditional self-report questionnaires to more robust and context-sensitive evaluations. The development of adaptive human-agent teaming is also a key area of research, with a focus on creating more dynamic and interactional approaches to team formation, task development, and team improvement. Another area of interest is the investigation of large language models' capabilities in relation to human abilities, with studies examining their mathematical reasoning, artistic critique generation, and theory of mind evaluation. Noteworthy papers in this area include: SPHERE, an evaluation card for human-AI systems that provides a framework for more transparent documentation and discussion of evaluation design options. Beyond Self-Reports, a novel multi-observer framework for large language model personality assessment that reduces non-systematic biases and achieves optimal reliability. Assesing LLMs in Art Contexts, a study that explores large language models' performance in writing critiques of artworks and reasoning about mental states in art-related situations, revealing their potential to produce expert-like output with carefully designed prompts.