Advances in Large Language Model Alignment

The field of large language model alignment is moving towards incorporating human preference alignment to improve the accuracy and naturalness of simultaneous speech translation and machine translation. Recent studies have highlighted the importance of preference variance in identifying informative examples for efficient language model alignment. The use of direct preference optimization (DPO) and multi-perspective preference optimization has shown promising results in achieving more robust and faithful translations. Furthermore, researchers are exploring new methods for collecting and selecting high-quality preference data, including user annotation from comparison mode and truncated influence functions. Notable papers in this area include:

  • DPO-Tuned Large Language Models for Segmentation in Simultaneous Speech Translation, which proposes a segmentation framework based on large language models trained with DPO.
  • Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation, which introduces a multi-perspective reward engine and a multi-pair construction strategy to create a more robust signal for preference optimization.
  • On the Role of Preference Variance in Preference Optimization, which investigates the impact of preference variance on the effectiveness of DPO training and provides a theoretical insight into the upper bound on the DPO gradient norm.

Sources

DPO-Tuned Large Language Models for Segmentation in Simultaneous Speech Translation

On the Role of Preference Variance in Preference Optimization

Towards Understanding Valuable Preference Data for Large Language Model Alignment

Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation

Users as Annotators: LLM Preference Learning from Comparison Mode

Built with on top of