The field of natural language processing is moving towards improving the performance of large language models (LLMs) in low-resource and morphologically rich languages. Recent studies have focused on evaluating the effectiveness of LLMs in these languages, highlighting the challenges they face in mastering linguistic nuances and cultural context. Noteworthy papers in this area include 'Evaluating Modern Large Language Models on Low-Resource and Morphologically Rich Languages' and 'Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders', which provide insights into the internal mechanisms of LLMs and their ability to form shared multilingual representations. Another significant contribution is the introduction of new benchmarks, such as LaoBench and AraLingBench, which enable more rigorous evaluation of LLMs in underrepresented languages.
Advances in Multilingual Large Language Models
Sources
Evaluating Modern Large Language Models on Low-Resource and Morphologically Rich Languages:A Cross-Lingual Benchmark Across Cantonese, Japanese, and Turkish
Exploring Parameter-Efficient Fine-Tuning and Backtranslation for the WMT 25 General Translation Task
Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning
ArbESC+: Arabic Enhanced Edit Selection System Combination for Grammatical Error Correction Resolving conflict and improving system combination in Arabic GEC
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models