Advances in Historical Language Modeling and Named Entity Recognition

The field of natural language processing is moving towards more effective modeling of historical languages and improved named entity recognition. Recent research has focused on developing unified character lists and visualization approaches to support typographic forensics and historical language understanding. Additionally, there have been advancements in few-shot learning and zero-shot prompting strategies for named entity recognition in low-resource domains. These innovations have the potential to enhance our understanding of ancient cultures and improve information extraction from historical texts. Noteworthy papers include: InteChar, which introduces a unified oracle bone character list for ancient Chinese language modeling, and ReProCon, which proposes a novel few-shot NER framework for biomedical domains.

Sources

InteChar: A Unified Oracle Bone Character List for Ancient Chinese Language Modeling

Kokatsuji: A Visualization Approach for Typographic Forensics of Early Japanese Movable Type

ReProCon: Scalable and Resource-Efficient Few-Shot Biomedical Named Entity Recognition

The Impact of Visual Segmentation on Lexical Word Recognition

Named Entity Recognition of Historical Text via Large Language Model

Supporting Intervention Design for Suicide Prevention with Language Model Assistants

Clustering-based Feature Representation Learning for Oracle Bone Inscriptions Detection

Leveraging Language Models and Machine Learning in Verbal Autopsy Analysis

Inference Gap in Domain Expertise and Machine Intelligence in Named Entity Recognition: Creation of and Insights from a Substance Use-related Dataset