Advances in Suicide Prevention and LLM Safety

The field of natural language processing and social media analysis is moving towards a more nuanced understanding of suicidal behavior and the development of safer large language models (LLMs). Recent studies have focused on identifying digital markers of suicidality, such as changes in YouTube engagement and mental health struggles, and have explored the use of LLMs for detecting life-threatening texts. Additionally, researchers have been working on improving LLM safety by developing methods for early stopping harmful outputs, detecting high-stakes interactions, and monitoring decomposition attacks. These advances have the potential to improve mental health outcomes and prevent harm. Noteworthy papers include:

  • A study on bridging online behavior and clinical insight to identify novel digital markers of suicidality,
  • A paper on detecting high-stakes interactions with activation probes, which offers a efficient and effective method for monitoring LLMs.

Sources

Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers

From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring

Large Language Models for Detection of Life-Threatening Texts

Detecting High-Stakes Interactions with Activation Probes

Video-Mediated Emotion Disclosure: A Study of Mental Health Vlogging by People with Schizophrenia on YouTube

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors

Built with on top of