Advancements in Autoregressive Image Generation

The field of autoregressive image generation is moving towards more efficient and effective models. Recent developments have focused on improving the structure of the prediction space, leveraging visual understanding priors, and unifying visual understanding and generation tasks. Notable advancements include the use of hierarchical semantic trees, continuous tokenizers, and causal attention mechanisms. These innovations have led to significant improvements in image generation quality and efficiency. Noteworthy papers include: MASC, which introduces a manifold-aligned semantic clustering framework to improve training efficiency and generation quality. REAR, which proposes a generator-tokenizer consistency regularization objective to address the inconsistency between the generator and tokenizer. VUGEN, which leverages visual understanding priors for efficient and high-quality image generation. Ming-UniVision, which introduces a unified continuous tokenizer for joint image understanding and generation. Heptapod, which employs causal attention and next 2D distribution prediction for comprehensive image semantics capture. IAR2, which enables a hierarchical semantic-detail synthesis process for advanced autoregressive visual generation.

Sources

MASC: Boosting Autoregressive Image Generation with a Manifold-Aligned Semantic Clustering

REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization

VUGEN: Visual Understanding priors for GENeration

Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer

Heptapod: Language Modeling on Visual Signals

IAR2: Improving Autoregressive Visual Generation with Semantic-Detail Associated Token Prediction

Built with on top of