Advances in Large Language Model Alignment

The field of large language model alignment is moving towards incorporating human preference alignment to improve the accuracy and naturalness of simultaneous speech translation and machine translation. Recent studies have highlighted the importance of preference variance in identifying informative examples for efficient language model alignment. The use of direct preference optimization (DPO) and multi-perspective preference optimization has shown promising results in achieving more robust and faithful translations. Furthermore, researchers are exploring new methods for collecting and selecting high-quality preference data, including user annotation from comparison mode and truncated influence functions. Notable papers in this area include:

DPO-Tuned Large Language Models for Segmentation in Simultaneous Speech Translation, which proposes a segmentation framework based on large language models trained with DPO.
Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation, which introduces a multi-perspective reward engine and a multi-pair construction strategy to create a more robust signal for preference optimization.
On the Role of Preference Variance in Preference Optimization, which investigates the impact of preference variance on the effectiveness of DPO training and provides a theoretical insight into the upper bound on the DPO gradient norm.

Advances in Large Language Model Alignment

Sources