Advances in Reinforcement Learning for Generative Models

The field of reinforcement learning (RL) is moving towards improving fine-tuning in sequential generative models, with a focus on KL-regularized methods. Recent work has highlighted the importance of RL for fine-tuning, with applications in generative flow networks and vision-language-action models. Notable developments include the introduction of new algorithms and frameworks for adapting vision-language-action models to downstream tasks, as well as the analysis of the relationship between RL and supervised fine-tuning.

Noteworthy papers include: Relative Trajectory Balance is equivalent to Trust-PCL, which establishes an equivalence between two RL methods and clarifies their relationship. Align-Then-stEer is a novel adaptation framework that improves the performance of vision-language-action models in cross-embodiment and cross-task manipulation. RL's Razor provides a principle for why on-policy RL preserves prior knowledge and capabilities better than supervised fine-tuning. Replicable Reinforcement Learning with Linear Function Approximation develops provably efficient replicable RL algorithms for linear Markov decision processes.

Sources

Relative Trajectory Balance is equivalent to Trust-PCL

Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance

Is RL fine-tuning harder than regression? A PDE learning approach for diffusion models

Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models

RL's Razor: Why Online Reinforcement Learning Forgets Less

Replicable Reinforcement Learning with Linear Function Approximation

Built with on top of