The field of diffusion models and reinforcement learning is rapidly evolving, with a focus on improving the quality and diversity of generated samples. Recent developments have centered around addressing the limitations of existing methods, such as mode collapse and reward hacking. Notably, innovative approaches have been proposed to enhance the alignment of diffusion models with downstream objectives, including the use of reparameterized policy gradients and data-regularized reinforcement learning. Additionally, there has been a growing interest in combining diffusion models with other techniques, such as visuomotor policies and open-loop routines, to improve performance in complex tasks. Overall, the field is moving towards more robust and scalable methods that can effectively balance competing objectives and improve the overall quality of generated samples.
Some noteworthy papers in this regard include: Multi-GRPO, which proposes a multi-group advantage estimation framework to improve the alignment of text-to-image models. Soft Quality-Diversity Optimization, which presents an alternative framing of the quality-diversity problem that sidesteps the need for discretizations. DPAC, which introduces a distribution-preserving adversarial control method to improve the perceptual fidelity of diffusion sampling. Data-regularized Reinforcement Learning, which uses the forward KL divergence to anchor the policy to an off-policy data distribution and alleviate reward hacking. FALCON, which combines modular diffusion policies with a vision-language foundation model to improve loco-manipulation tasks.