Advancements in Text-to-Image Generation

The field of text-to-image generation is moving towards more precise and flexible control over the generated images. Researchers are exploring new methods to improve the adherence to user-specified constraints, such as color palettes and spatial relationships. One of the key directions is the development of novel loss functions and optimization techniques that can learn from the model's internal representations and improve the spatial accuracy of the generated images. Another important area of research is the enhancement of text-to-image models through prompt rewriting and reward scaling, which can significantly improve the image-text alignment and the overall quality of the generated images. Notable papers in this area include: Palette Aligned Image Diffusion, which introduces a novel method for conditioning text-to-image diffusion models on a user-specified color palette. Data-Driven Loss Functions for Inference-Time Optimization in Text-to-Image Generation, which proposes a framework for learning data-driven objectives for test-time optimization. PromptEnhancer, which introduces a novel prompt rewriting framework that enhances any pretrained text-to-image model without requiring modifications to its weights. RewardDance, which introduces a scalable reward modeling framework that overcomes the limitations of existing approaches and unlocks scaling across two dimensions: model scaling and context scaling.

Sources

Palette Aligned Image Diffusion

Data-Driven Loss Functions for Inference-Time Optimization in Text-to-Image Generation

PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting

RewardDance: Reward Scaling in Visual Generation

Built with on top of