Advances in Multi-Objective Reinforcement Learning for Large Language Models

The field of large language models is moving towards more sophisticated reinforcement learning techniques, with a focus on multi-objective optimization. Researchers are exploring new methods to mitigate reward hacking, improve alignment with human preferences, and enhance the overall performance of large language models. Notable papers in this area include:

  • OrthAlign, which introduces a novel approach to resolving gradient-level conflicts in multi-objective preference alignment using orthogonal subspace decomposition.
  • MO-GRPO, which proposes a simple normalization method to reweight reward functions and ensure even contribution to the loss function.
  • Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards, which presents a unified framework for aligning large language models across various domains and objectives.

Sources

Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training

Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective

MO-GRPO: Mitigating Reward Hacking of Group Relative Policy Optimization on Multi-Objective Problems

OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment

Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks

Rethinking Reward Models for Multi-Domain Test-Time Scaling

Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards

Built with on top of