Multimodal Learning and E-commerce Search Advancements

The field of e-commerce search and recommendation is witnessing significant advancements with the integration of multimodal learning. Researchers are exploring innovative approaches to leverage textual and visual data to improve search and recommendation systems. A key direction is the development of end-to-end multimodal recommendation systems that can jointly optimize multimodal and recommendation components, enabling real-time parameter updates and tighter alignment with downstream objectives. Another area of focus is the improvement of dense retrieval systems, which are critical for e-commerce search engines. Novel frameworks and techniques, such as multi-objective reinforcement learning and dynamic modality-balanced multimodal representation learning, are being proposed to address challenges like modality imbalance and limited handling of noise in multimodal data. Noteworthy papers in this area include LEMUR, which proposes a large-scale end-to-end multimodal recommender system, and MOON2.0, which introduces a dynamic modality-balanced multimodal representation learning framework. Additionally, papers like TaoSearchEmb and CroPS are making significant contributions to the development of more effective and efficient search systems.

Sources

LEMUR: Large scale End-to-end MUltimodal Recommendation

MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising

A Deep Learning Model to Predicting Changes in Consumer Attributes for New Line-extended Products

MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding

TaoSearchEmb: A Multi-Objective Reinforcement Learning Framework for Dense Retrieval in Taobao Search

Hint-Augmented Re-ranking: Efficient Product Search using LLM-Based Query Decomposition

Infer As You Train: A Symmetric Paradigm of Masked Generative for Click-Through Rate Prediction

Image-Seeking Intent Prediction for Cross-Device Product Search

CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search

UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment

Built with on top of