The field of natural language processing is moving towards improving the performance of machine translation and language models in low-resource settings. Recent studies have shown that backtranslation, a widely used technique for generating synthetic training data, may not always be effective in high-quality, low-resource settings. Instead, researchers are exploring new methods for adapting pre-trained language models to low-resource languages and developing more effective techniques for translatingStyle and cultural nuances. Noteworthy papers include the proposal of AdaptGOT, a pre-trained model for adaptive contextual POI representation learning, and the introduction of LIGHT, a novel multi-modal approach for linking text on historical maps. Additionally, the Translation Barrier Hypothesis highlights the importance of addressing implicit translation failure in multilingual generation with large language models.
Advances in Low-Resource Machine Translation and Multilingual Language Modeling
Sources
The Saturation Point of Backtranslation in High Quality Low Resource English Gujarati Machine Translation
Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation
The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure