The field of generative models is moving towards leveraging diffusion models for various tasks, including image and audio generation, music synthesis, and material simulation. These models have shown promise in achieving high-quality results while providing interpretability and control over the generation process. Notably, researchers are exploring the use of diffusion transformers, hierarchical architectures, and self-supervised pre-training to improve the efficiency and quality of generative models. Some noteworthy papers in this area include:
- ProGress, which introduces a novel generative music framework that incorporates concepts of Schenkerian analysis and diffusion modeling for structured music generation.
- Audio Palette, which presents a diffusion transformer-based model for controllable Foley synthesis with fine-grained acoustic control.
- Hierarchical Koopman Diffusion, which achieves both one-step sampling and interpretable generative trajectories for image generation.