Text-to-Image Models: Advancing Representation and Evaluation

The field of text-to-image (T2I) generation is moving towards addressing concerns about representation, diversity, and evaluation. Researchers are developing novel frameworks and metrics to measure and control representational harms in image generation, such as intersectional group representation and default-mode diversity. New benchmarks and evaluation toolkits are being introduced to assess T2I models' ability to follow complex instructions and generate high-quality images. These advancements aim to improve the overall performance and robustness of T2I models. Noteworthy papers in this area include:

  • A paper introducing a novel framework to measure the representation of intersectional groups in images generated by T2I models, which can effectively guide models toward a more balanced generation across demographic groups.
  • A paper presenting a comprehensive benchmark and agent framework for complex instruction-based image generation, enabling a thorough assessment of a model's ability to follow complex instructions.
  • A paper proposing a quantitative evaluation framework for default-mode diversity and generalization in T2I generative models, providing a flexible and interpretable framework for assessing model performance.

Sources

Multi-Group Proportional Representation for Text-to-Image Models

Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation

TIIF-Bench: How Does Your T2I Model Follow Your Instructions?

DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models

Built with on top of