The field of large language models (LLMs) is moving towards developing stronger reasoning capabilities to solve complex problems effectively. Recent studies have highlighted the limitations of existing methods, such as Chain-of-Thought (CoT) reasoning, and have proposed alternative approaches, including explicit high-level plan generation and bi-level frameworks for structured reasoning. These innovative methods have shown significant improvements in accuracy and generalizability across various domains, including mathematical reasoning, code generation, and financial question answering. Notably, the use of multi-domain datasets, such as CRISP, has enabled fine-tuning of small models to generate higher-quality plans than larger models using few-shot prompting. Furthermore, the introduction of cache steering methods has improved the qualitative structure of model reasoning and quantitative task performance. Noteworthy papers include:
- CRISP, which introduces a multi-domain dataset for high-level plan generation and demonstrates its effectiveness in improving plan quality.
- From Language to Logic, which proposes a bi-level framework for structured reasoning and achieves significant accuracy gains on realistic reasoning benchmarks.
- KV Cache Steering, which presents a lightweight method for implicit steering of language models and improves both the qualitative structure of model reasoning and quantitative task performance.