Advances in Large Language Model Reasoning

The field of large language model (LLM) reasoning is rapidly evolving, with a focus on improving the accuracy and reliability of these models. Recent developments have centered around enhancing the reasoning capabilities of LLMs through innovative methods such as multi-agent adaptive planning, reinforced rule-based reasoning, and self-aware weakness-driven problem synthesis. These approaches have shown significant improvements over existing methods, achieving state-of-the-art performance on various benchmarks. Notably, the integration of reinforcement learning and dynamic sampling has emerged as a key factor in advancing the field.

Some noteworthy papers in this regard include: MAPLE, which proposes a novel framework for table-based question answering that integrates multiple cognitive agents and achieves significant improvements over existing methods. Corrector Sampling is also notable, as it introduces a new sampling method that mitigates error accumulation in autoregressive language models, resulting in relative improvements on reasoning and coding benchmarks.

Sources

MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning

Corrector Sampling in Language Models

Reinforce LLM Reasoning through Multi-Agent Reflection

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Learning to Reason Across Parallel Samples for LLM Reasoning

PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

Built with on top of