Advances in Autoregressive Visual Generation

The field of autoregressive visual generation is moving towards scaling and improving the quality of generated images. Recent developments have focused on improving the efficiency and effectiveness of autoregressive models, including the use of semantic regularization and entropy loss to stabilize training. Additionally, there is a trend towards personalization, with models being optimized for subject-specific image synthesis. Noteworthy papers include GigaTok, which achieves state-of-the-art performance in reconstruction and downstream autoregressive generation, and InstantCharacter, which demonstrates open-domain personalization across diverse character appearances and styles. Furthermore, SimpleAR shows that with careful optimization, autoregressive models can generate high-fidelity images without complex architecture modifications, and Seedream 3.0 presents significant improvements in text-rendering and image quality. Overall, the field is advancing rapidly, with a focus on improving the quality, efficiency, and personalization of autoregressive visual generation models.

Sources

GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation

Seedream 3.0 Technical Report

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL

InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework

Privacy Protection Against Personalized Text-to-Image Synthesis via Cross-image Consistency Constraints

Personalized Text-to-Image Generation with Auto-Regressive Models

Built with on top of