The field of mathematical reasoning and multimodal information retrieval is witnessing significant advancements with the integration of large language models (LLMs) and external tools. Researchers are exploring innovative approaches to enhance the capabilities of LLMs, including the development of multi-tool aggregation frameworks, reinforcement learning-based tool integration, and large-scale datasets for multimodal agent tuning. These advancements are leading to improved performance in mathematical reasoning, information retrieval, and analysis tasks. Noteworthy papers in this area include:
- Multi-TAG, which proposes a finetuning-free, inference-only framework for scaling math reasoning with multi-tool aggregation, achieving substantial improvements over state-of-the-art baselines.
- AutoTIR, which introduces a reinforcement learning framework for autonomous tool integration in LLMs, demonstrating superior overall performance and generalization in tool-use behavior.
- MMAT-1M, which presents a large-scale multimodal agent tuning dataset, supporting chain-of-thought and dynamic tool usage, and leading to significant performance gains in multimodal reasoning and tool-based capabilities.