Text-to-Image Generation and Image Editing Research

The field of text-to-image generation and image editing is rapidly advancing, with a focus on improving the accuracy and compositional ability of models. Researchers are developing new benchmarks and evaluation frameworks to assess the performance of these models, such as CompAlign, which emphasizes 3D-spatial relationships, and CROC, which evaluates the robustness of metrics. The introduction of grounded evaluation frameworks, like GIE-Bench, is also a significant development, as it allows for more precise assessment of text-guided image editing models. Furthermore, studies are being conducted to understand the capabilities and limitations of generative AI in everyday image editing tasks, highlighting the need for improvements in areas such as preserving the identity of people and animals. Noteworthy papers include CompAlign, which proposes a complex benchmark and fine-grained feedback for improving compositional image generation, and CROC, which introduces a scalable framework for automated Contrastive Robustness Checks. Additionally, GIE-Bench provides a diagnostic benchmark for evaluating text-guided image editing models, and KRIS-Bench introduces a knowledge-based reasoning benchmark for intelligent image editing systems.

Sources

CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback

CROC: Evaluating and Training T2I Metrics with Pseudo- and Human-Labeled Contrastive Robustness Checks

GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing

Understanding Generative AI Capabilities in Everyday Image Editing Tasks

NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

T2I-ConBench: Text-to-Image Benchmark for Continual Post-training

Built with on top of