Advances in AI Evaluation and Analysis

The field of AI research is rapidly evolving, with a growing focus on developing innovative methods for evaluating and analyzing the performance of large language models. One key area of development is the creation of platforms and tools for comparing and assessing the quality of AI-generated text. This includes the development of fine-grained human annotation frameworks and open-source implementations of factuality evaluation metrics. Another area of advancement is the application of dynamic topic modeling techniques to analyze the evolution of global policy discourse and track emerging trends in financial analysis. Noteworthy papers include: OpenFActScore, which provides an open-source implementation of the FActScore framework, and DTECT, which introduces an end-to-end system for dynamic topic exploration and context tracking.

Sources

Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents

OpenFActScore: Open-Source Atomic Evaluation of Factuality in Text Generation

Temporal Analysis of Climate Policy Discourse: Insights from Dynamic Embedded Topic Modeling

Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights

FRaN-X: FRaming and Narratives-eXplorer

Agentic Retrieval of Topics and Insights from Earnings Calls

DTECT: Dynamic Topic Explorer & Context Tracker

Built with on top of