The field of multilingual large language models (LLMs) is rapidly evolving, with a focus on improving performance in low-resource languages and addressing challenges such as bias and language variation. Recent research has highlighted the importance of developing benchmarks and evaluation frameworks that can assess the capabilities of LLMs in multiple languages. Additionally, there is a growing interest in exploring the potential of LLMs for applications such as moral reasoning, abusive language detection, and language identification. Noteworthy papers in this area include PolyMath, which introduces a multilingual mathematical reasoning benchmark, and Moral Reasoning Across Languages, which evaluates the moral reasoning abilities of LLMs across five typologically diverse languages. Other notable papers include Mind the Language Gap, which presents a framework for multilingual bias testing, and TF1-EN-3M, which introduces a large dataset of synthetic moral fables for training small, open language models.
Advances in Multilingual Large Language Models
Sources
Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages
Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages
Creating and Evaluating Code-Mixed Nepali-English and Telugu-English Datasets for Abusive Language Detection Using Traditional and Deep Learning Models