Advances in Natural Language Processing for Low-Resource Languages

The field of natural language processing is moving towards the development of more accurate and context-aware models for low-resource languages. Recent research has focused on improving the clustering of text data, detecting hate speech and cyberbullying, and enhancing sentiment analysis in these languages. The use of advanced techniques such as stacked autoencoders, transformer-based models, and large language models has shown promising results. Notably, the application of these models has led to significant improvements in the accuracy and relevance of search results, as well as the detection of toxic and offensive content. Noteworthy papers include:

A study on the application of stacked autoencoders and AraBERT embeddings for improving relational aggregated search, which demonstrated improved accuracy and relevance of search results.
A submission to the PAN 2025 Multilingual Text Detoxification Challenge, which achieved a first-place ranking on high-resource and low-resource languages using a massively multilingual model and parameter-efficient fine-tuning techniques.
A research paper on LLM-based sentiment classification from Bangladesh e-commerce reviews, which showed that fine-tuned LLMs can outperform other models with high accuracy and precision.
A study on LLM-based multi-task Bangla hate speech detection, which introduced a new dataset and established a stronger benchmark for developing culturally aligned moderation tools in low-resource contexts.
A paper on enhanced Arabic-language cyberbullying detection, which demonstrated high accuracy using deep embedding and transformer approaches.

Advances in Natural Language Processing for Low-Resource Languages

Sources