Sign Language Recognition and Translation

The field of sign language recognition and translation is rapidly advancing, with a focus on developing more accurate and efficient methods for continuous sign language recognition (CSLR) and automatic sign language translation (ASLT). Recent studies have highlighted the importance of incorporating non-manual facial features, such as eyes, mouth, and full face, into ASLR systems, with the mouth being identified as the most important feature. Additionally, the use of deep learning models, such as transformer-based models and convolutional neural networks (CNNs), has shown significant improvements in recognition accuracy. The development of new datasets, such as the iLSU-T dataset for Uruguayan Sign Language, is also crucial for advancing the field. Noteworthy papers include AutoSign, which proposes a direct pose-to-text translation approach for CSLR, and Beyond Gloss, which introduces a novel gloss-free SLT framework that leverages spatio-temporal reasoning capabilities of Video Large Language Models (VideoLLMs).

Sources

AutoSign: Direct Pose-to-Text Translation for Continuous Sign Language Recognition

Color histogram equalization and fine-tuning to improve expression recognition of (partially occluded) faces on sign language datasets

Indian Sign Language Detection for Real-Time Translation using Machine Learning

The Importance of Facial Features in Vision-based Sign Language Recognition: Eyes, Mouth or Full Face?

iLSU-T: an Open Dataset for Uruguayan Sign Language Translation

Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language Translation

Built with on top of