Advances in Multilingual Toxic Content Detection

Introduction

The field of toxic content detection is rapidly evolving, with a growing focus on multilingual and culturally grounded approaches. Recent studies have highlighted the importance of considering the nuances of language and cultural context in detecting toxic content, particularly in non-English languages.

General Direction

The field is moving towards the development of more sophisticated and culturally sensitive methods for detecting toxic content, including the use of multimodal approaches, active learning, and multi-task learning. There is also a growing recognition of the need for culturally specific analyses and evaluations of large language models (LLMs) to ensure their safety and ethical alignment.

Noteworthy Papers

  • The paper on IndoSafety presents a high-quality, human-verified safety evaluation dataset tailored for the Indonesian context, covering five language varieties, and finds that existing Indonesian-centric LLMs often generate unsafe outputs.
  • The paper on AnswerCarefully introduces a dataset for promoting the safety and appropriateness of Japanese LLM outputs, and shows that using this dataset for instruction to fine-tune a Japanese LLM led to improved output safety without compromising the utility of general responses.
  • The paper on Evaluating Prompt-Driven Chinese Large Language Models reveals significant gender biases in refusal rates and demonstrates that certain negative personas can amplify toxicity toward Chinese social groups, and proposes an innovative multi-model feedback strategy to mitigate this toxicity.

Sources

Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings

Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections

Predicting the Past: Estimating Historical Appraisals with OCR and Machine Learning

AnswerCarefully: A Dataset for Improving the Safety of Japanese LLM Output

IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages

Multi-task Learning with Active Learning for Arabic Offensive Speech Detection

Culture Matters in Toxic Language Detection in Persian

Evaluating Prompt-Driven Chinese Large Language Models: The Influence of Persona Assignment on Stereotypes and Safeguards

Built with on top of