The field of large language models (LLMs) is moving towards improving reasoning under uncertainty, with a focus on developing methods that can effectively handle incomplete information and express predictions as probabilistic priors. Recent work has highlighted the importance of evaluating LLMs on numerical estimation tasks that require synthesizing significant amounts of background information. Additionally, there is a growing interest in developing theoretical frameworks for analyzing sampling-based test-time scaling methods, which can enhance reasoning performance by generating multiple reasoning paths during inference. Noteworthy papers in this area include OpenEstimate, which introduces an extensible, multi-domain benchmark for evaluating LLMs on numerical estimation tasks, and A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning, which provides a theoretical framework for analyzing sampling-based test-time scaling methods. Other notable papers include CarBoN, TrajSelector, and Annotation-Efficient Universal Honesty Alignment, which propose novel methods for improving test-time reasoning, efficient and effective Best-of-N selection, and honesty alignment in LLMs, respectively.