Advancements in Speech Recognition and Processing

The field of speech recognition and processing is moving towards more efficient and robust models, with a focus on low-resource languages and edge devices. Recent studies have explored the use of transfer learning, attention mechanisms, and grapheme-to-phoneme conversion to improve speech recognition accuracy. Additionally, there is a growing interest in developing lightweight models that can run on edge devices, enabling real-time speech recognition and processing. Noteworthy papers include the proposal of a unified denoising and adaptation framework for self-supervised Bengali dialectal ASR, which achieved state-of-the-art results, and the introduction of ArabEmoNet, a lightweight hybrid 2D CNN-BiLSTM model for robust Arabic speech emotion recognition. Furthermore, the development of tiny specialized ASR models, such as Flavors of Moonshine, has shown promising results for underrepresented languages. Overall, the field is advancing towards more accurate, efficient, and accessible speech recognition and processing systems.

Sources

Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks

A Unified Denoising and Adaptation Framework for Self-Supervised Bengali Dialectal ASR

Speech Command Recognition Using LogNNet Reservoir Computing for Embedded Systems

ArabEmoNet: A Lightweight Hybrid 2D CNN-BiLSTM Model with Attention for Robust Arabic Speech Emotion Recognition

CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-Car Speech Separation with Distributed Heterogeneous Arrays

NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task

Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices

Isolated Bangla Handwritten Character Classification using Transfer Learning

High Cursive Complex Character Recognition using GAN External Classifier

LatPhon: Lightweight Multilingual G2P for Romance Languages and English

E-ARMOR: Edge case Assessment and Review of Multilingual Optical Character Recognition

Denoising GER: A Noise-Robust Generative Error Correction with LLM for Speech Recognition

From Silent Signals to Natural Language: A Dual-Stage Transformer-LLM Approach

Leveraging Transfer Learning and Mobile-enabled Convolutional Neural Networks for Improved Arabic Handwritten Character Recognition

LatinX: Aligning a Multilingual TTS Model with Direct Preference Optimization

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition

Exploring Light-Weight Object Recognition for Real-Time Document Detection

A New Hybrid Model of Generative Adversarial Network and You Only Look Once Algorithm for Automatic License-Plate Recognition

VRAE: Vertical Residual Autoencoder for License Plate Denoising and Deblurring