The field of Artificial Intelligence (AI) is rapidly evolving, with a focus on developing more rigorous evaluation methods and benchmarks to assess the capabilities of AI agents. Recent research has highlighted the need for holistic, product-informed measures of real-world use cases, such as scientific research, and the importance of accounting for confounding variables like model cost and tool access. Studies have also explored the comparison of human and agent workflows across diverse occupations, revealing that while agents exhibit promise, they often take a programmatic approach and produce work of inferior quality. However, agents can deliver results significantly faster and at a lower cost than humans, highlighting the potential for efficient collaboration. Noteworthy papers in this area include AstaBench, which provides a comprehensive suite for benchmarking AI agents in scientific research, and the Iceberg Index, which measures workforce exposure to AI capabilities across the economy. These developments are expected to have significant impacts on the future of work and the economy, and will likely inform targeted regional AI development strategies and investments.