Introduction

The field of toxic content detection is rapidly evolving, with a growing focus on multilingual and culturally grounded approaches. Recent studies have highlighted the importance of considering the nuances of language and cultural context in detecting toxic content, particularly in non-English languages.

General Direction

The field is moving towards the development of more sophisticated and culturally sensitive methods for detecting toxic content, including the use of multimodal approaches, active learning, and multi-task learning. There is also a growing recognition of the need for culturally specific analyses and evaluations of large language models (LLMs) to ensure their safety and ethical alignment.

Noteworthy Papers

The paper on IndoSafety presents a high-quality, human-verified safety evaluation dataset tailored for the Indonesian context, covering five language varieties, and finds that existing Indonesian-centric LLMs often generate unsafe outputs.
The paper on AnswerCarefully introduces a dataset for promoting the safety and appropriateness of Japanese LLM outputs, and shows that using this dataset for instruction to fine-tune a Japanese LLM led to improved output safety without compromising the utility of general responses.
The paper on Evaluating Prompt-Driven Chinese Large Language Models reveals significant gender biases in refusal rates and demonstrates that certain negative personas can amplify toxicity toward Chinese social groups, and proposes an innovative multi-model feedback strategy to mitigate this toxicity.

Advances in Multilingual Toxic Content Detection

Introduction

General Direction

Noteworthy Papers

Sources