Subject Indexing in Digital Libraries

The field of subject indexing in digital libraries is moving towards leveraging large language models (LLMs) and innovative machine learning techniques to improve accuracy and efficiency. Researchers are exploring the potential of combining traditional natural language processing algorithms with modern LLM techniques to enhance subject tagging in multilingual contexts. The use of ontology alignment tools and retrieval-augmented generation techniques is also being investigated to address the challenges of subject indexing. Furthermore, framing subject tagging as an information retrieval problem and using two-stage information retrieval systems is proving to be an effective approach. Noteworthy papers include: Annif at SemEval-2025 Task 5, which demonstrated the potential of combining traditional XMTC algorithms with LLM techniques. Homa at SemEval-2025 Task 5, which leveraged OntoAligner for subject tagging and highlighted the potential of alignment techniques. TartuNLP at SemEval-2025 Task 5, which framed subject tagging as a two-stage information retrieval problem and showed significant improvements in recall. DNB-AI-Project at SemEval-2025 Task 5, which used an LLM-ensemble approach and achieved the best result in qualitative ranking conducted by subject indexing experts.

Subject Indexing in Digital Libraries

Sources