Advances in Text-to-Image Generation and Evaluation

The field of text-to-image generation is rapidly advancing, with a focus on improving the evaluation and fairness of models. Researchers are exploring new methods for evaluating text-to-image models, including the use of multi-modal language models and benchmarks that assess world knowledge grounding and implicit inferential capabilities. Additionally, there is a growing concern about the cultural biases present in these models, with efforts to develop more inclusive and diverse datasets.

Noteworthy papers in this area include:

  • Multi-Modal Language Models as Text-to-Image Model Evaluators, which presents a novel evaluation framework that uses multi-modal language models to assess prompt-generation consistency and image aesthetics.
  • Deconstructing Bias: A Multifaceted Framework for Diagnosing Cultural and Compositional Inequities in Text-to-Image Generative Models, which benchmarks a metric designed to evaluate the fidelity of image generation across cultural contexts and provides insights into architectural and data-centric interventions for enhancing cultural inclusivity.
  • WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation, which introduces a benchmark designed to systematically evaluate text-to-image models' world knowledge grounding and implicit inferential capabilities.
  • Generative Sign-description Prompts with Multi-positive Contrastive Learning for Sign Language Recognition, which proposes a novel method that leverages retrieval-augmented generation and domain-specific large language models to produce precise multipart descriptions for sign language recognition.
  • CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts, which introduces a novel benchmark designed to evaluate the robustness of large language models on code generation from code-mixed prompts.

Sources

Multi-Modal Language Models as Text-to-Image Model Evaluators

Deconstructing Bias: A Multifaceted Framework for Diagnosing Cultural and Compositional Inequities in Text-to-Image Generative Models

WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

Improving Physical Object State Representation in Text-to-Image Generative Systems

Generative Sign-description Prompts with Multi-positive Contrastive Learning for Sign Language Recognition

Text to Image Generation and Editing: A Survey

OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution

Multimodal Benchmarking and Recommendation of Text-to-Image Generation Models

CRAFT: Cultural Russian-Oriented Dataset Adaptation for Focused Text-to-Image Generation

CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts

Built with on top of