Advances in Multimodal and Multilingual Natural Language Processing

The field of Natural Language Processing (NLP) is moving towards increased integration of multimodal and multilingual capabilities. Researchers are exploring the use of large language models (LLMs) to improve text diacritization, machine translation, and literary translation quality assessment. Multimodal benchmarks, such as those evaluating vision-language models on tasks that combine visual reasoning with subject-specific background knowledge, are being developed to stress-test AI systems. Additionally, there is a growing focus on creating datasets and evaluation metrics that can accurately assess the performance of LLMs in low-resource languages and domains. Notable papers in this area include: MAS-LitEval, which proposes a multi-agent system for literary translation quality assessment, and Rethinking Multilingual Vision-Language Translation, which presents a comprehensive study of vision-language translation from three key perspectives: data quality, model architecture, and evaluation metrics.

Advances in Multimodal and Multilingual Natural Language Processing

Sources