The field of text-to-image (T2I) generation is moving towards addressing concerns about representation, diversity, and evaluation. Researchers are developing novel frameworks and metrics to measure and control representational harms in image generation, such as intersectional group representation and default-mode diversity. New benchmarks and evaluation toolkits are being introduced to assess T2I models' ability to follow complex instructions and generate high-quality images. These advancements aim to improve the overall performance and robustness of T2I models. Noteworthy papers in this area include:
- A paper introducing a novel framework to measure the representation of intersectional groups in images generated by T2I models, which can effectively guide models toward a more balanced generation across demographic groups.
- A paper presenting a comprehensive benchmark and agent framework for complex instruction-based image generation, enabling a thorough assessment of a model's ability to follow complex instructions.
- A paper proposing a quantitative evaluation framework for default-mode diversity and generalization in T2I generative models, providing a flexible and interpretable framework for assessing model performance.