Advances in Multimodal Analysis and Detection

The field of multimodal analysis and detection is rapidly evolving, with a focus on developing innovative methods to analyze and interpret complex data from various sources. Recent studies have explored the capabilities of large language models (LLMs) and vision-language models (VLMs) in detecting deception, image splicing, and deepfakes, as well as analyzing emotions and sentiment in text and images. The use of multimodal learning analytics and personalized feedback frameworks has also shown promise in improving student learning outcomes and enhancing academic emotions. Notably, LLMs have demonstrated competitive performance in zero-shot settings for image forensics tasks, while VLMs have exhibited moderate performance in academic facial expression recognition. However, despite these advancements, LLMs are not yet reliable for standalone deepfake detection and require further development.

Noteworthy papers include: The paper on using LLMs for image splicing detection, which achieved competitive detection performance in zero-shot settings. The study on detecting voice phishing with precision using fine-tuned small language models, which yielded the best performance among small LMs and was comparable to that of a GPT-4-based VP detector.

Sources

Can ChatGPT Perform Image Splicing Detection? A Preliminary Study

Sentiment Analysis in Learning Management Systems Understanding Student Feedback at Scale

Detecting Voice Phishing with Precision: Fine-Tuning Small Language Models

Towards Provenance-Aware Earth Observation Workflows: the openEO Case Study

MOSAIC-F: A Framework for Enhancing Students' Oral Presentation Skills through Personalized Feedback

Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities of LLMs in Multimodal Settings

Where Journalism Silenced Voices: Exploring Discrimination in the Representation of Indigenous Communities in Bangladesh

Dataset of News Articles with Provenance Metadata for Media Relevance Assessment

Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME

Using Vision Language Models to Detect Students' Academic Emotion through Facial Expressions

LLMs Are Not Yet Ready for Deepfake Image Detection

Built with on top of