Advances in Culturally Aware Language Models and Gender Bias Evaluation

The field of natural language processing is moving towards developing more culturally aware and inclusive language models. Recent studies have highlighted the importance of considering cultural nuances and regional variations in machine translation systems. The use of multimodal approaches, such as incorporating images as cultural context, has shown promising results in improving translation quality. Furthermore, there is a growing emphasis on evaluating and mitigating social biases in large language models, particularly with regards to gender stereotypes. Researchers are developing new datasets and benchmarks to assess gender bias in multilingual language models, and investigates the feasibility of using natural text to augment commonsense repositories with stereotypical gender expectations. Noteworthy papers include CaMMT, which introduces a human-curated benchmark for evaluating culturally aware multimodal machine translation, and EuroGEST, which presents a dataset designed to measure gender-stereotypical reasoning in multilingual language models. Additionally, the development of large language models such as EuroLLM-9B, which supports all 24 official European Union languages and 11 additional languages, is a significant step towards addressing the issue of underrepresentation of European languages in existing open large language models.

Sources

ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural Heritage

CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation

HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America

Gender Inequality in English Textbooks Around the World: an NLP Approach

Stereotypical gender actions can be extracted from Web text

EuroGEST: Investigating gender stereotypes in multilingual language models

EuroLLM-9B: Technical Report

Built with on top of