Advances in Diffusion Models and Reinforcement Learning

The field of diffusion models and reinforcement learning is rapidly evolving, with a focus on improving the quality and diversity of generated samples. Recent developments have centered around addressing the limitations of existing methods, such as mode collapse and reward hacking. Notably, innovative approaches have been proposed to enhance the alignment of diffusion models with downstream objectives, including the use of reparameterized policy gradients and data-regularized reinforcement learning. Additionally, there has been a growing interest in combining diffusion models with other techniques, such as visuomotor policies and open-loop routines, to improve performance in complex tasks. Overall, the field is moving towards more robust and scalable methods that can effectively balance competing objectives and improve the overall quality of generated samples.

Some noteworthy papers in this regard include: Multi-GRPO, which proposes a multi-group advantage estimation framework to improve the alignment of text-to-image models. Soft Quality-Diversity Optimization, which presents an alternative framing of the quality-diversity problem that sidesteps the need for discretizations. DPAC, which introduces a distribution-preserving adversarial control method to improve the perceptual fidelity of diffusion sampling. Data-regularized Reinforcement Learning, which uses the forward KL divergence to anchor the policy to an off-policy data distribution and alleviate reward hacking. FALCON, which combines modular diffusion policies with a vision-language foundation model to improve loco-manipulation tasks.

Sources

Multi-GRPO: Multi-Group Advantage Estimation for Text-to-Image Generation with Tree-Based Trajectories and Multiple Rewards

Soft Quality-Diversity Optimization

Goal-Driven Reward by Video Diffusion Models for Reinforcement Learning

DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling

GRASP: Guided Residual Adapters with Sample-wise Partitioning

Exploring Definitions of Quality and Diversity in Sonic Measurement Spaces

Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling

Data-regularized Reinforcement Learning for Diffusion Models at Scale

FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function

Hybrid-Diffusion Models: Combining Open-loop Routines with Visuomotor Diffusion Policies

Built with on top of