Advances in Multimodal Emotion Recognition and Document Analysis

Introduction

The field of multimodal emotion recognition and document analysis has seen significant advancements in recent times. Researchers have been focusing on developing innovative models that can effectively handle missing modalities, preserve unique characteristics of each modality, and improve recognition performance.

General Direction

The current direction of the field is towards developing models that can learn from multiple sources of data, such as speech, text, and visual information. These models aim to capture the heterogeneity and complementary information in multimodal data, enabling more accurate emotion recognition and document analysis. Attention-based diffusion models, autoregressive models, and discrete diffusion models are some of the key approaches being explored.

Noteworthy Papers

  • The paper on ADMC presents a novel attention-based diffusion model for missing modalities feature completion, achieving state-of-the-art results on the IEMOCAP and MIntRec benchmarks. The paper on DREAM introduces an innovative autoregressive model for document reconstruction, achieving unparalleled performance in the realm of document reconstruction. The HeLo framework proposes a multi-modal emotion distribution learning approach that effectively explores the heterogeneity and complementary information in multimodal emotional data. The Bayesian Discrete Diffusion model is also noteworthy, as it achieves better perplexity than autoregressive models, with a test perplexity of 8.8 on WikiText-2.

Sources

ADMC: Attention-based Diffusion Model for Missing Modalities Feature Completion

DREAM: Document Reconstruction via End-to-end Autoregressive Model

HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning

Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices

Discrete Diffusion Models for Language Generation

Bayesian Discrete Diffusion Beats Autoregressive Perplexity

Built with on top of