The field of Large Language Models (LLMs) is moving towards more comprehensive and nuanced evaluations, with a focus on assessing their ability to provide accurate and reliable information across multiple fields. Researchers are developing new frameworks and resources to support these evaluations, such as those that distill survey articles into queries and rubrics. Another area of innovation is the integration of emotional intelligence into LLM agents, enabling them to engage in more effective multi-turn negotiations. Additionally, there is a growing emphasis on auditing and improving the reliability of deep research AI systems, including their ability to track and attribute evidence. Notable papers in this area include: ResearchQA, which introduces a resource for evaluating LLM systems across 75 fields, and EvoEmo, which presents an evolutionary reinforcement learning framework for optimizing dynamic emotional expression in negotiations. DeepTRACE is also noteworthy, as it introduces a novel sociotechnically grounded audit framework for tracking reliability across citations and evidence.