Advances in Multilingual Large Language Models

The field of natural language processing is moving towards improving the performance of large language models (LLMs) in low-resource and morphologically rich languages. Recent studies have focused on evaluating the effectiveness of LLMs in these languages, highlighting the challenges they face in mastering linguistic nuances and cultural context. Noteworthy papers in this area include 'Evaluating Modern Large Language Models on Low-Resource and Morphologically Rich Languages' and 'Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders', which provide insights into the internal mechanisms of LLMs and their ability to form shared multilingual representations. Another significant contribution is the introduction of new benchmarks, such as LaoBench and AraLingBench, which enable more rigorous evaluation of LLMs in underrepresented languages.

Sources

Evaluating Modern Large Language Models on Low-Resource and Morphologically Rich Languages:A Cross-Lingual Benchmark Across Cantonese, Japanese, and Turkish

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders

DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains

LaoBench: A Large-Scale Multidimensional Lao Benchmark for Large Language Models

Exploring Parameter-Efficient Fine-Tuning and Backtranslation for the WMT 25 General Translation Task

From Phonemes to Meaning: Evaluating Large Language Models on Tamil

How Good is BLI as an Alignment Measure: A Study in Word Embedding Paradigm

uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data

Translation Entropy: A Statistical Framework for Evaluating Translation Systems

Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study

Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning

Can QE-informed (Re)Translation lead to Error Correction?

ArbESC+: Arabic Enhanced Edit Selection System Combination for Grammatical Error Correction Resolving conflict and improving system combination in Arabic GEC

AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

Examining the Metrics for Document-Level Claim Extraction in Czech and Slovak

Subword Tokenization Strategies for Kurdish Word Embeddings

NeuCLIRBench: A Modern Evaluation Collection for Monolingual, Cross-Language, and Multilingual Information Retrieval

LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs

HinTel-AlignBench: A Framework and Benchmark for Hindi-Telugu with English-Aligned Samples

IndicGEC: Powerful Models, or a Measurement Mirage?

Building Robust and Scalable Multilingual ASR for Indian Languages

Incorporating Token Importance in Multi-Vector Retrieval

TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval