Advances in Composed Image Retrieval and Text-to-Image Generation

The field of composed image retrieval and text-to-image generation is rapidly evolving, with a focus on improving the flexibility and accuracy of models. Researchers are exploring new methods to address the limitations of current approaches, including the use of large language models to analyze user instructions and determine the task to execute. The development of scalable pipelines for automatic triplet generation and the creation of large-scale fashion datasets are also notable trends. These advancements have the potential to enhance the performance of models in real-world applications, such as e-commerce websites. Noteworthy papers include OFFSET, which proposes a focus mapping-based feature extractor to reduce the impact of noise interference, and TalkFashion, which introduces an intelligent virtual try-on assistant based on multimodal large language models. Additionally, Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval and FACap, a large-scale fashion dataset, demonstrate significant contributions to the field.

Advances in Composed Image Retrieval and Text-to-Image Generation

Sources