Advances in Controllable Image Generation

Introduction

The field of image generation is rapidly advancing, with a focus on developing models that can produce high-quality images with fine-grained control. Recent research has made significant progress in this area, with the development of new architectures and techniques that enable more precise control over the generation process.

Current Developments

The current direction of the field is towards developing models that can generate images with spatial control, allowing users to specify the layout and structure of the generated image. This is being achieved through the use of techniques such as masking, token-based generation, and inference-time scaling. Additionally, there is a growing interest in developing autoregressive models that can generate images in a single forward pass, eliminating the need for intermediate representations or token-based generation.

Noteworthy Papers

  • MADI: Masking-Augmented Diffusion with Inference-Time Scaling for Visual Editing, introduces a novel framework for controllable image generation that improves editability, compositionality, and controllability of diffusion models.
  • Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling, presents a stand-alone autoregressive model that achieves generation quality on par with state-of-the-art diffusion models while preserving flexibility and compositionality.

Sources

MADI: Masking-Augmented Diffusion with Inference-Time Scaling for Visual Editing

A Practical Investigation of Spatially-Controlled Image Generation with Transformers

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Towards Consistent Long-Term Pose Generation

Built with on top of