Advances in Large Language Models and Multilingualism

The field of natural language processing is witnessing significant developments in the area of large language models (LLMs) and their applications in multilingual settings. Recent studies have highlighted the importance of considering missingness and omission in LLMs, as well as the need for more inclusive and diverse training data. The use of LLMs in low-resource languages and the evaluation of their performance in these languages are also gaining attention. Noteworthy papers include: The paper on omission-aware graph inference for misinformation detection, which presents a novel framework for detecting omission-based deception. The BERnaT paper, which demonstrates the importance of capturing linguistic diversity in building inclusive language models. The CALAMITA initiative, which provides a comprehensive benchmark for evaluating LLMs in Italian and highlights the need for fine-grained, task-representative metrics.

Sources

Mind the data gap: Missingness Still Shapes Large Language Model Prognoses

Reasoning About the Unsaid: Misinformation Detection with Omission-Aware Graph Inference

Modeling Topics and Sociolinguistic Variation in Code-Switched Discourse: Insights from Spanish-English and Spanish-Guaran\'i

BERnaT: Basque Encoders for Representing Natural Textual Diversity

Challenging the Abilities of Large Language Models in Italian: a Community Initiative

Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case

Built with on top of