The field of natural language processing is moving towards greater support for low-resource languages, with several papers presenting new datasets, models, and techniques for improving performance in these languages. A key trend is the use of multilingual models, which have been shown to outperform monolingual models on a range of tasks. Another area of focus is the development of more efficient and effective methods for adapting large language models to new languages and tasks. Notable papers include those presenting new datasets for Vietnamese and Nepali, as well as a study on the use of few-shot prompting for in-context learning in low-resource languages. Noteworthy papers include:
- VSMRC, which presents a new dataset for Vietnamese text segmentation and multiple-choice reading comprehension.
- NepaliGPT, which introduces a generative language model for the Nepali language.
- RELIC, which proposes a novel framework for enhancing reward model generalization for low-resource Indic languages.