The field of Composed Image Retrieval (CIR) is rapidly advancing, with a focus on improving the accuracy and efficiency of image retrieval systems. Recent developments have centered around addressing the challenges of limited training data and the need for more effective ways to capture fine-grained modification semantics. Researchers are exploring innovative approaches, such as generative models, prediction-based mapping networks, and fine-grained textual inversion networks, to enhance the performance of CIR systems. Additionally, there is a growing emphasis on developing robust data annotation pipelines and leveraging large language models to generate high-quality training data. These advancements have the potential to significantly improve the precision and recall of CIR systems, enabling more accurate and efficient image retrieval. Noteworthy papers include:
- Generative Compositor, which proposes a novel generative model for few-shot visual information extraction, achieving highly competitive results in full-sample training and outperforming baselines in few-shot settings.
- FineCIR, which introduces a robust fine-grained CIR data annotation pipeline and a framework that explicitly parses modification text, consistently outperforming state-of-the-art CIR baselines on fine-grained and traditional CIR benchmark datasets.