The field of reinforcement learning (RL) is moving towards improving fine-tuning in sequential generative models, with a focus on KL-regularized methods. Recent work has highlighted the importance of RL for fine-tuning, with applications in generative flow networks and vision-language-action models. Notable developments include the introduction of new algorithms and frameworks for adapting vision-language-action models to downstream tasks, as well as the analysis of the relationship between RL and supervised fine-tuning.
Noteworthy papers include: Relative Trajectory Balance is equivalent to Trust-PCL, which establishes an equivalence between two RL methods and clarifies their relationship. Align-Then-stEer is a novel adaptation framework that improves the performance of vision-language-action models in cross-embodiment and cross-task manipulation. RL's Razor provides a principle for why on-policy RL preserves prior knowledge and capabilities better than supervised fine-tuning. Replicable Reinforcement Learning with Linear Function Approximation develops provably efficient replicable RL algorithms for linear Markov decision processes.