Advances in Explainability and Reward Modeling for Large Language Models

The field of large language models (LLMs) is moving towards increased explainability and transparency, with a focus on developing novel methods for understanding and improving model decision-making processes. Recent research has highlighted the importance of chain-of-thought reasoning and reward modeling in enhancing model performance and aligning with human preferences. However, challenges remain in ensuring the faithfulness of chain-of-thought representations and developing efficient and effective reward modeling approaches. Notable papers in this area include RM-R1, which introduces a new class of generative reward models that formulate reward modeling as a reasoning task, and Unified Multimodal Chain-of-Thought Reward Model, which proposes a unified multimodal CoT-based reward model for vision tasks. These papers demonstrate significant improvements in model performance and interpretability, and highlight the potential for future research in this area.

Sources

Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines

MADIL: An MDL-based Framework for Efficient Program Synthesis in the ARC Benchmark

A Generalised and Adaptable Reinforcement Learning Stopping Method

Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study

RM-R1: Reward Modeling as Reasoning

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Evaluating Contrastive Feedback for Effective User Simulations

Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Chain-of-Thought Tokens are Computer Program Variables

Reasoning Models Don't Always Say What They Think