Advances in Large Language Model Alignment and Fine-Tuning

The field of large language models is rapidly evolving, with a growing focus on alignment and fine-tuning techniques to improve model performance and safety. Recent research has highlighted the importance of preference alignment and knowledge distillation in achieving robust and generalizable models. Notably, the traditional pipeline of knowledge distillation followed by alignment has been shown to be limiting, and reversing this pipeline has been demonstrated to be essential for effective alignment. Furthermore, innovative fine-tuning methods such as anchored supervised fine-tuning and one-token rollout have been proposed, which leverage techniques like reward-weighted regression and policy gradient to improve model performance. Additionally, there is a growing recognition of the need to move beyond traditional log likelihood objectives in supervised fine-tuning, with research exploring alternative probability-based objectives that can adapt to different model capabilities. Some particularly noteworthy papers in this area include: Why Alignment Must Precede Distillation, which demonstrates the importance of reversing the traditional knowledge distillation and alignment pipeline. Anchored Supervised Fine-Tuning, which proposes a novel fine-tuning method that augments dynamic fine-tuning with lightweight KL regularization to preserve tightness while ensuring stability. UniAPL, which reframes alignment as a unified preference learning problem and proposes a novel framework that dynamically aligns the policy's distribution with the expert's. One-Token Rollout, which guides supervised fine-tuning with the policy gradient method and treats each token generation as a single-step reinforcement learning trajectory.

Sources

Why Alignment Must Precede Distillation: A Minimal Working Explanation

Anchored Supervised Fine-Tuning

Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning

UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following

Spectral Logit Sculpting: Adaptive Low-Rank Logit Transformation for Controlled Text Generation

RL-Guided Data Selection for Language Model Finetuning

One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient

Debunk the Myth of SFT Generalization

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead

Built with on top of