Multimodal Analysis for Mental Health and Hate Speech Detection

The field of multimodal analysis is rapidly advancing, with a focus on detecting mental health conditions such as depression and hate speech in videos and social media. Researchers are proposing novel frameworks and datasets to improve the accuracy of detection models. The use of contrastive learning, transformer networks, and multimodal fusion techniques is becoming increasingly popular. These approaches enable the effective extraction and fusion of features from multiple modalities, leading to better performance in depression detection and hate speech analysis. Notable papers in this area include: ImpliHateVid, which introduces a large-scale dataset and a two-stage contrastive learning framework for implicit hate speech detection in videos. MMFformer, which proposes a multimodal depression detection network that surpasses existing state-of-the-art approaches. eMotions, which provides a large-scale dataset and an audio-visual fusion network for emotion analysis in short-form videos. MDD-Net, which utilizes mutual transformers to efficiently extract and fuse multimodal features for depression detection.

Sources

ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos

MMFformer: Multimodal Fusion Transformer Network for Depression Detection

eMotions: A Large-Scale Dataset and Audio-Visual Fusion Network for Emotion Analysis in Short-form Videos

MDD-Net: Multimodal Depression Detection through Mutual Transformer

Built with on top of