Cross-Lingual Named Entity Recognition Advancements

The field of Cross-Lingual Named Entity Recognition (CL-NER) is moving towards addressing the challenges of transferring knowledge from high-resource languages to low-resource languages, with a focus on non-Latin script languages. Researchers are exploring innovative approaches to mitigate language differences and improve entity alignment. One notable direction is the use of large language models (LLMs) and meta-pretraining techniques to enhance zero-shot CL-NER performance. Additionally, there is a growing interest in developing specialized models for code-mixed NER tasks, which can outperform generalized models. The use of internal representations of LLMs to embed entity mentions and user-provided type descriptions into a shared semantic space is also showing promise. Noteworthy papers include:

  • Zero-shot Cross-lingual NER via Mitigating Language Difference, which proposes an entity-aligned translation approach to address challenges in non-Latin script languages.
  • NER Retriever, a zero-shot retrieval framework that builds on internal representations of LLMs to embed entity mentions and user-provided type descriptions into a shared semantic space.
  • Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition, which demonstrates the effectiveness of meta-pretraining for small decoder LMs in low-resource languages.

Sources

Zero-shot Cross-lingual NER via Mitigating Language Difference: An Entity-aligned Translation Perspective

Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

Comparative Study of Pre-Trained BERT and Large Language Models for Code-Mixed Named Entity Recognition

NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

Built with on top of