Multilingual Advances in Large Language Models

The field of natural language processing is moving towards improved multilingual capabilities, with a focus on addressing the challenges of code-mixing, language diversity, and low-resource languages. Researchers are exploring innovative methods to enhance the performance of large language models (LLMs) in these areas, including the development of novel evaluation benchmarks, dictionary selection methods, and translation strategies. Notable papers in this area include: Evaluating Code-Mixing in LLMs Across 18 Languages, which proposes a comprehensive evaluation of LLMs' performance on code-mixed data and suggests improvements in training data size, model scale, and few-shot learning. SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language Models, which presents a novel method for automatic dictionary selection that surpasses strong baselines and saves token usage. Mind the Language Gap in Digital Humanities: LLM-Aided Translation of SKOS Thesauri, which introduces an open-source pipeline for automated translation of SKOS thesauri and enhances accessibility, reuse, and cross-lingual interoperability of knowledge resources. Multi-Hypothesis Distillation of Multilingual Neural Translation Models for Low-Resource Languages, which explores sequence-level knowledge distillation of multilingual pre-trained encoder-decoder translation models and presents a method that generates multiple translations for each source sentence. How and Where to Translate? The Impact of Translation Strategies in Cross-lingual LLM Prompting, which systematically evaluates the impact of different prompt translation strategies for classification tasks with RAG-enhanced LLMs in multilingual systems.

Multilingual Advances in Large Language Models

Sources