Addressing Biases in Vision-Language Models

The field of vision-language models is moving towards addressing the significant biases that exist in these models, particularly with regards to age, gender, race, and skin tone. Recent research has highlighted the need for more diverse and representative datasets, as well as the development of new metrics and evaluation frameworks to assess and mitigate these biases. A key area of focus is the creation of benchmarks and datasets that can help to identify and address biases in vision-language models, such as those related to pediatric care and cultural competence. Noteworthy papers in this area include: PediatricsMQA, which introduces a comprehensive multi-modal pediatric question-answering benchmark to address age bias in medical informatics, and Ask Me Again Differently: GRAS, which proposes a benchmark for uncovering demographic biases in vision language models across gender, race, age, and skin tone. DemoBias is also a notable study that empirically evaluates demographic biases in large vision language models for biometric face recognition tasks. Toward Socially Aware Vision-Language Models is another significant work that evaluates cultural competence in vision-language models through multimodal story generation.

Sources

PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark

Toward Socially Aware Vision-Language Models: Evaluating Cultural Competence Through Multimodal Story Generation

Ask Me Again Differently: GRAS for Measuring Bias in Vision Language Models on Gender, Race, Age, and Skin Tone

DemoBias: An Empirical Study to Trace Demographic Biases in Vision Foundation Models

Built with on top of