Advances in Multimodal and Multilingual Natural Language Processing

The field of Natural Language Processing (NLP) is moving towards increased integration of multimodal and multilingual capabilities. Researchers are exploring the use of large language models (LLMs) to improve text diacritization, machine translation, and literary translation quality assessment. Multimodal benchmarks, such as those evaluating vision-language models on tasks that combine visual reasoning with subject-specific background knowledge, are being developed to stress-test AI systems. Additionally, there is a growing focus on creating datasets and evaluation metrics that can accurately assess the performance of LLMs in low-resource languages and domains. Notable papers in this area include: MAS-LitEval, which proposes a multi-agent system for literary translation quality assessment, and Rethinking Multilingual Vision-Language Translation, which presents a comprehensive study of vision-language translation from three key perspectives: data quality, model architecture, and evaluation metrics.

Sources

Smotrom tvoja pa ander drogoj verden! Resurrecting Dead Pidgin with Generative Models: Russenorsk Case Study

A Gamified Evaluation and Recruitment Platform for Low Resource Language Machine Translation Systems

Are LLMs Good Text Diacritizers? An Arabic and Yor\`ub\'a Case Study

VLM@school -- Evaluation of AI image understanding on German middle school knowledge

Rethinking Multilingual Vision-Language Translation: Dataset, Evaluation, and Adaptation

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

MAS-LitEval : Multi-Agent System for Literary Translation Quality Assessment

COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation

Built with on top of