Advances in Hate Speech Detection and AI Explainability

The field of natural language processing is moving towards more nuanced and explainable models, particularly in the areas of hate speech detection and content moderation. Recent research has highlighted the importance of considering the context and implicit associations in language models, rather than just relying on explicit keywords. This has led to the development of new approaches, such as Topic Association Analysis and Masked Hard Instance Mining, which aim to improve the explainability and accuracy of hate speech detection models. Additionally, there is a growing focus on compositional generalization, which involves training models to recognize and generalize hates speech patterns in a more structured and contextual way. Notable papers in this area include 'Probing Association Biases in LLM Moderation Over-Sensitivity', which introduced Topic Association Analysis to quantify how LLMs associate certain topics with toxicity. Another noteworthy paper is 'Explainable AI: XAI-Guided Context-Aware Data Augmentation', which proposed a novel framework for data augmentation that leverages XAI techniques to modify less critical features while preserving task-relevant ones.

Sources

Probing Association Biases in LLM Moderation Over-Sensitivity

Explainable Depression Detection using Masked Hard Instance Mining

Conditioning Large Language Models on Legal Systems? Detecting Punishable Hate Speech

Explainable AI: XAI-Guided Context-Aware Data Augmentation

Compositional Generalisation for Explainable Hate Speech Detection

Cracking the Code: Enhancing Implicit Hate Speech Detection through Coding Classification

Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation

Built with on top of