Advancements in Visual Generation and Preference Alignment

The field of visual generation is witnessing significant advancements with the development of new methods for preference alignment and reinforcement learning. Recent research is focusing on improving the expressiveness and effectiveness of reward models, enabling more accurate and informative learning signals for visual generation. Notably, innovative approaches such as Visual Preference Policy Optimization and Bayesian Prior-Guided Optimization are being explored to address the limitations of traditional Group Relative Policy Optimization methods. These advancements have the potential to enhance the quality and realism of generated images and videos, and to improve the alignment of generative models with human preferences. Noteworthy papers in this area include: Seeing What Matters: Visual Preference Policy Optimization for Visual Generation, which introduces a GRPO variant that lifts scalar feedback into structured, pixel-level advantages. Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation, which explicitly models reward uncertainty through a semantic prior anchor. MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models, which introduces two complementary methods to jointly optimize multiple rewards without incurring an alignment tax.

Sources

Seeing What Matters: Visual Preference Policy Optimization for Visual Generation

Learning What to Trust: Bayesian Prior-Guided Optimization for Visual Generation

Beyond Reward Margin: Rethinking and Resolving Likelihood Displacement in Diffusion Models via Video Generation

Growing with the Generator: Self-paced GRPO for Video Generation

The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation

MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

Video Generation Models Are Good Latent Reward Models

Built with on top of