Advances in Diffusion Models for Image Editing and Generation

The field of diffusion models is rapidly advancing, with a focus on improving image editing and generation capabilities. Recent developments have enabled more flexible and precise editing operations, including multi-method iterative image editing and improved image-text alignment. Researchers are also exploring new training methods and architectures to enhance the efficiency and quality of diffusion models. Notably, innovative approaches such as treating the data space of pre-trained diffusion models as a Riemannian manifold and using sparse-to-sparse training have shown great promise. Noteworthy papers include:

  • REED-VAE, which introduces a RE-encode decode training scheme for variational autoencoders to promote image quality preservation in iterative image editing.
  • Marmot, which proposes a multi-agent reasoning framework for multi-object self-correcting to improve image-text alignment and facilitate more coherent multi-object image editing.
  • Image Interpolation with Score-based Riemannian Metrics of Diffusion Models, which presents a novel framework for image interpolation using a Riemannian metric derived from the score function.
  • In-Context Edit, which enables instructional image editing with in-context generation in large-scale diffusion transformers.
  • Can We Achieve Efficient Diffusion without Self-Attention, which replaces self-attention modules with Pyramid Convolution Blocks to reduce computational cost.
  • Revisiting Diffusion Autoencoder Training for Image Reconstruction Quality, which proposes a new training method to improve the quality of reconstructed images.
  • Sparse-to-Sparse Training of Diffusion Models, which introduces the paradigm of sparse-to-sparse training to improve both training and inference efficiency.

Sources

REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models

Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment

Image Interpolation with Score-based Riemannian Metrics of Diffusion Models

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions

Revisiting Diffusion Autoencoder Training for Image Reconstruction Quality

Sparse-to-Sparse Training of Diffusion Models

Built with on top of