Advances in Hate Speech Detection and AI Explainability

The field of natural language processing is moving towards more nuanced and explainable models, particularly in the areas of hate speech detection and content moderation. Recent research has highlighted the importance of considering the context and implicit associations in language models, rather than just relying on explicit keywords. This has led to the development of new approaches, such as Topic Association Analysis and Masked Hard Instance Mining, which aim to improve the explainability and accuracy of hate speech detection models. Additionally, there is a growing focus on compositional generalization, which involves training models to recognize and generalize hates speech patterns in a more structured and contextual way. Notable papers in this area include 'Probing Association Biases in LLM Moderation Over-Sensitivity', which introduced Topic Association Analysis to quantify how LLMs associate certain topics with toxicity. Another noteworthy paper is 'Explainable AI: XAI-Guided Context-Aware Data Augmentation', which proposed a novel framework for data augmentation that leverages XAI techniques to modify less critical features while preserving task-relevant ones.

Advances in Hate Speech Detection and AI Explainability

Sources