Advances in Large Language Models

The field of Large Language Models (LLMs) is witnessing significant developments, with a focus on addressing positional biases and improving performance on reasoning tasks. Researchers are exploring innovative methods to mitigate these biases, such as leveraging the primacy effect to improve Multiple Choice Question Answering (MCQA) and introducing adaptive repetition strategies to reduce position bias in LLM-based ranking. Additionally, there is a growing emphasis on evaluating LLMs' genuine reasoning capabilities, with studies investigating the impact of different question types on LLM accuracy and proposing new evaluation frameworks to measure and mitigate selection bias. Noteworthy papers include: Exploiting Primacy Effect To Improve Large Language Models, which strategically leverages the primacy effect to improve MCQA performance. SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models, which introduces a framework to measure and mitigate selection bias in LLM evaluations.

Sources

Exploiting Primacy Effect To Improve Large Language Models

Inverse Scaling in Test-Time Compute

It's Not That Simple. An Analysis of Simple Test-Time Scaling

Reasoning Models are Test Exploiters: Rethinking Multiple-Choice

Metric assessment protocol in the context of answer fluctuation on MCQ tasks

Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?

Does More Inference-Time Compute Really Help Robustness?

Adaptive Repetition for Mitigating Position Bias in LLM-Based Ranking

GenSelect: A Generative Approach to Best-of-N

SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

Built with on top of