Advances in Aligning Large Language Models with Human Preferences

The field of large language models (LLMs) is rapidly advancing, with a focus on aligning these models with human preferences. Recent research has explored various approaches to achieve this goal, including multi-objective alignment, preference learning, and reward modeling. A key challenge in this area is balancing the trade-off between different objectives and preferences, and several studies have proposed novel frameworks and methods to address this issue. For example, some papers have introduced new architectures and algorithms for multi-objective alignment, while others have investigated the use of physics-based feedback and cognitive signals to improve alignment. Noteworthy papers in this area include the Preference Orchestrator framework, which automatically infers prompt-specific preference weights, and the GEM approach, which uses generative entropy-guided preference modeling for few-shot alignment of LLMs. Overall, the field is moving towards more sophisticated and effective methods for aligning LLMs with human preferences, with potential applications in a wide range of areas, from natural language processing to decision-making and optimization.

Sources

Preference Orchestrator: Prompt-Aware Multi-Objective Alignment for Large Language Models

When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets

From Single to Societal: Analyzing Persona-Induced Bias in Multi-Agent Interactions

Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models

Bootstrapping LLMs via Preference-Based Policy Optimization

Maximizing the efficiency of human feedback in AI alignment: a comparative analysis

The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive Curation

GEM: Generative Entropy-Guided Preference Modeling for Few-shot Alignment of LLMs

Learning Branching Policies for MILPs with Proximal Policy Optimization

Beyond Mimicry: Preference Coherence in LLMs

PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning

Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language

Look-Ahead Reasoning on Learning Platforms

Two-Faced Social Agents: Context Collapse in Role-Conditioned Large Language Models

SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning