Advances in Automated Reasoning and Large Language Models

The field of automated reasoning and large language models is rapidly advancing, with a focus on improving the efficiency and correctness of formal proof generation. Researchers are exploring new approaches, such as combining the strengths of formal verification systems with the reasoning abilities of large language models. This has led to significant gains in both efficiency and correctness, with some models achieving state-of-the-art accuracy on benchmark tasks. Notable papers in this area include APOLLO, which presents a modular pipeline for automated proof generation, and Think in Safety, which unveils and mitigates safety alignment collapse in multimodal large reasoning models. Additionally, models such as INTELLECT-2 and Qwen3 are pushing the boundaries of large language models, with Qwen3 introducing a thinking mode and non-thinking mode into a unified framework, allowing for dynamic mode switching and improved performance.

Sources

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model

INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

Learning from Peers in Reasoning Models

AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale

Qwen3 Technical Report

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Built with on top of