Advances in Reasoning Under Uncertainty in Large Language Models

The field of large language models (LLMs) is moving towards improving reasoning under uncertainty, with a focus on developing methods that can effectively handle incomplete information and express predictions as probabilistic priors. Recent work has highlighted the importance of evaluating LLMs on numerical estimation tasks that require synthesizing significant amounts of background information. Additionally, there is a growing interest in developing theoretical frameworks for analyzing sampling-based test-time scaling methods, which can enhance reasoning performance by generating multiple reasoning paths during inference. Noteworthy papers in this area include OpenEstimate, which introduces an extensible, multi-domain benchmark for evaluating LLMs on numerical estimation tasks, and A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning, which provides a theoretical framework for analyzing sampling-based test-time scaling methods. Other notable papers include CarBoN, TrajSelector, and Annotation-Efficient Universal Honesty Alignment, which propose novel methods for improving test-time reasoning, efficient and effective Best-of-N selection, and honesty alignment in LLMs, respectively.

Sources

OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data

A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning

CarBoN: Calibrated Best-of-N Sampling Improves Test-time Reasoning

TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model

Mapping from Meaning: Addressing the Miscalibration of Prompt-Sensitive Language Models

I-RAVEN-X: Benchmarking Generalization and Robustness of Analogical and Mathematical Reasoning in Large Language and Reasoning Models

Annotation-Efficient Universal Honesty Alignment

Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality

Systematic Evaluation of Uncertainty Estimation Methods in Large Language Models

Built with on top of