Multimodal Learning Advances

The field of multimodal learning is rapidly advancing, with a focus on developing more effective and efficient methods for integrating and processing multiple forms of data. Recent research has emphasized the importance of capturing complex relationships between different modalities, such as images, text, and audio, in order to improve performance on tasks like classification, retrieval, and generation. Notable advances include the development of new architectures and training strategies that can handle multiple modalities and tasks simultaneously, as well as the use of contrastive learning and other techniques to improve the alignment and representation of different modalities. Some papers are particularly noteworthy for their innovative approaches and significant contributions to the field, such as OmniVec2, which proposes a novel multimodal multitask network and achieves state-of-the-art performance on multiple datasets, and U-MARVEL, which presents a comprehensive study on the key factors that drive effective embedding learning for universal multimodal retrieval and introduces a unified framework that outperforms state-of-the-art competitors.

Sources

OmniVec2 -- A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction

Optimizing Legal Document Retrieval in Vietnamese with Semi-Hard Negative Mining

Benchmarking Foundation Models with Multimodal Public Electronic Health Records

U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs

Semantic-Aware Representation Learning for Multi-label Image Classification

Open-set Cross Modal Generalization via Multimodal Unified Representation

Constraint-aware Learning of Probabilistic Sequential Models for Multi-Label Classification

CLAMP: Contrastive Learning with Adaptive Multi-loss and Progressive Fusion for Multimodal Aspect-Based Sentiment Analysis

Principled Multimodal Representation Learning

QuMAB: Query-based Multi-annotator Behavior Pattern Learning

LLM-based Embedders for Prior Case Retrieval

Built with on top of