Advancements in Deep Search Agents and Large Language Models

The field of deep search agents and large language models is rapidly evolving, with a focus on developing more sophisticated and autonomous systems. Recent developments have seen the introduction of multimodal deep search agents, such as WebWatcher, which can comprehend visual information and execute multi-turn retrieval with dynamic planning. Additionally, benchmarks like BrowseComp-Plus and DatasetResearch have been proposed to evaluate the performance of deep research agents and dataset discovery systems. These benchmarks have highlighted the limitations of current systems and the need for more advanced architectures and training methods. Notably, the K-Dense Analyst system has achieved state-of-the-art performance on the BixBench benchmark, demonstrating the potential for autonomous bioinformatics analysis. Furthermore, the development of open-source frameworks like OpenCUA and OdysseyBench is expected to accelerate research in this area. Some noteworthy papers include WebWatcher, which introduces a multi-modal agent for deep research with enhanced visual-language reasoning capabilities, and K-Dense Analyst, which achieves autonomous bioinformatics analysis through a hierarchical multi-agent system.

Sources

A Survey of LLM-based Deep Search Agents: Paradigm, Optimization, Evaluation, and Challenges

WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent

BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery

Can Smaller Large Language Models Evaluate Research Quality?

K-Dense Analyst: Towards Fully Automated Scientific Analysis

MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark

WideSearch: Benchmarking Agentic Broad Info-Seeking

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

AgriGPT: a Large Language Model Ecosystem for Agriculture

OpenCUA: Open Foundations for Computer-Use Agents

OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application Workflows

BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair

webMCP: Efficient AI-Native Client-Side Interaction for Agent-Ready Web Design

Improving and Evaluating Open Deep Research Agents