Advances in Natural Language Processing for Specialized Domains

The field of natural language processing is moving towards more specialized and domain-specific approaches, with a focus on improving performance in areas such as disaster management, coreference resolution, and document alignment. Researchers are developing new models and techniques that can handle the unique challenges of these domains, such as varied search intents and long reference chains. Noteworthy papers include DMRetriever, which achieves state-of-the-art performance in disaster management text retrieval, and Forging GEMs, which advances Greek NLP through quality-based corpus curation and specialized pre-training. Other notable papers include NeoDictaBERT, which pushes the frontier of BERT models for Hebrew, and Automating Iconclass, which presents a novel methodology for classifying early modern religious images using Large Language Models and Retrieval-Augmented Generation.

Sources

DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management

The Elephant in the Coreference Room: Resolving Coreference in Full-Length French Fiction Works

BiMax: Bidirectional MaxSim Score for Document-Level Alignment

Lingua Custodi's participation at the WMT 2025 Terminology shared task

Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark

Automating Iconclass: LLMs and RAG for Large-Scale Classification of Religious Woodcuts

Forging GEMs: Advancing Greek NLP through Quality-Based Corpus Curation and Specialized Pre-training

NeoDictaBERT: Pushing the Frontier of BERT models for Hebrew

Built with on top of