The field of Large Language Model (LLM) agents is moving towards greater integration with enterprise systems, enabling intelligent automation, personalized experiences, and efficient information retrieval. Researchers are developing benchmarks and frameworks to evaluate and improve the performance of LLM agents in complex, real-world environments. A key challenge is developing systems that can efficiently retrieve and utilize tools in a scalable and cost-effective manner. Noteworthy papers in this area include:
- EnterpriseBench, which demonstrates the challenges of developing LLM agents for enterprise environments and highlights opportunities for improvement.
- ScaleCall, which presents a comprehensive study of tool retrieval methods for enterprise environments and provides practical insights into the trade-offs between retrieval accuracy, computational efficiency, and operational requirements.
- TPS-Bench, which introduces a benchmark for evaluating the ability of LLM agents to solve compounding real-world problems that require tool planning and scheduling.
- Tool-to-Agent Retrieval, which presents a unified framework for bridging tools and agents in scalable LLM multi-agent systems.
- CostBench, which evaluates the economic reasoning and replanning abilities of LLM agents in dynamic environments.