Advances in Large Language Models and Reasoning

The field of large language models (LLMs) and reasoning is rapidly evolving, with a focus on improving the reliability, safety, and usefulness of these models. Recent research has highlighted the importance of instruction following, transparency, and controllability in LLMs, as well as the need to address vulnerabilities such as reasoning distraction and deadlock attacks. Notably, the development of novel frameworks and benchmarks, such as LawChain and PROBE, is enabling more comprehensive evaluations of LLMs' reasoning capabilities. Furthermore, the integration of LLMs with symbolic NLU systems and probabilistic rule learning is showing promise in enhancing the accuracy and reliability of these models. Overall, the field is moving towards more robust, transparent, and controllable LLMs that can be trusted to perform complex tasks. Noteworthy papers in this regard include ReasonIF, which introduced a systematic benchmark for assessing reasoning instruction following, and Prompt Decorators, which proposed a declarative and composable syntax for governing LLM behavior. Additionally, the paper on Distractor Injection Attacks highlighted the vulnerability of LLMs to reasoning distraction and proposed a training-based defense to mitigate this risk.

Sources

ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning

One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

What Can String Probability Tell Us About Grammaticality?

Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense

LawChain: Modeling Legal Reasoning Chains for Chinese Tort Case Analysis

The Atomic Instruction Gap: Instruction-Tuned LLMs Struggle with Simple, Self-Contained Directives

When Models Can't Follow: Testing Instruction Adherence Across 256 LLMs

ChatGPT Unveils Its Limits: Principles of Law Deliver Checkmate

An Argumentative Explanation Framework for Generalized Reason Model with Inconsistent Precedents

BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models

RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models

Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents

Prompt Decorators: A Declarative and Composable Syntax for Reasoning, Formatting, and Control in LLMs

LLM-Augmented Symbolic NLU System for More Reliable Continuous Causal Statement Interpretation

A Fundamental Algorithm for Dependency Parsing (With Corrections)

Neural Reasoning for Robust Instance Retrieval in $\mathcal{SHOIQ}$