Digital Humanities Research

The field of digital humanities is witnessing significant advancements in the digitization and analysis of historical texts and music. Recent developments have focused on improving the accuracy and efficiency of optical character recognition (OCR) and optical music recognition (OMR) techniques, enabling the preservation and accessibility of cultural heritage materials. Notably, researchers are exploring the application of large language models (LLMs) and machine learning algorithms to address the challenges posed by diverse text layouts, linguistic variations, and scarce training data. These innovations have the potential to expand the applicability of digital humanities research to underrepresented languages and music traditions.

Some noteworthy papers in this area include: KuiSCIMA v2.0, which presents significant advancements in OMR for historical Chinese musical notations, achieving a remarkable Character Error Rate (CER) of 0.9% for l"ul"upu notations. AI-Driven Generation of Old English, which introduces a scalable framework for generating high-quality Old English texts using advanced LLMs, offering a practical blueprint for revitalizing other endangered languages.

Sources

KuiSCIMA v2.0: Improved Baselines, Calibration, and Cross-Notation Generalization for Historical Chinese Music Notations in Jiang Kui's Baishidaoren Gequ

Comparing OCR Pipelines for Folkloristic Text Digitization

AI-Driven Generation of Old English: A Framework for Low-Resource Languages

Page image classification for content-specific data processing

Built with on top of