Advancements in Retrieval-Augmented Generation and GUI Retrieval

The field of retrieval-augmented generation and GUI retrieval is moving towards more effective and efficient methods for constructing and evaluating systems. Recent research has focused on developing novel frameworks and algorithms that integrate large language models and multimodal approaches to improve retrieval performance and generalizability. Notably, the use of multimodal large language models and DOM downsampling techniques has shown promise in enhancing the capabilities of web agents and GUI retrieval systems. Furthermore, the development of unified evaluation platforms has enabled more comprehensive and user-centric assessments of system performance. Overall, the field is advancing towards more robust and scalable solutions for complex document understanding and GUI retrieval tasks. Noteworthy papers include: GUI-ReRank, which introduces a novel framework for GUI retrieval that integrates rapid embedding-based constrained retrieval models with highly effective MLLM-based reranking techniques. Double-Bench, a new large-scale evaluation system for document retrieval-augmented generation systems that provides fine-grained assessment and supports dynamic update for potential data contamination issues.

Sources

GUI-ReRank: Enhancing GUI Retrieval with Multi-Modal LLM-based Reranking

Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?

Beyond Pixels: Exploring DOM Downsampling for LLM-Based Web Agents

RankArena: A Unified Platform for Evaluating Retrieval, Reranking and RAG with Human and LLM Feedback

Built with on top of