Advances in Retrieval-Augmented Generation and Question Answering

The field of natural language processing is witnessing significant advancements in retrieval-augmented generation (RAG) and question answering (QA). Researchers are focusing on developing more efficient and effective methods for evaluating and improving RAG systems, particularly in multi-modal settings. The integration of metadata and external information is becoming increasingly important for enhancing the capabilities of large language models. Noteworthy efforts include the development of modular benchmarks and datasets that incorporate metadata and support more precise and contextualized queries. These advances have the potential to improve the accuracy and reliability of QA systems, especially in domains that require rapid analysis of large volumes of data. Notable papers include: AMAQA, which introduces a new QA dataset that integrates metadata and supports precise and contextualized queries, demonstrating a significant boost in accuracy. Q2Forge, which presents a novel approach for generating competency questions and SPARQL queries for question-answering over knowledge graphs, supporting the creation of reference query sets for any target knowledge graph.

Sources

mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs

AMAQA: A Metadata-based QA Dataset for RAG Systems

Q${}^2$Forge: Minting Competency Questions and SPARQL Queries for Question-Answering Over Knowledge Graphs

Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks

BR-TaxQA-R: A Dataset for Question Answering with References for Brazilian Personal Income Tax Law, including case law

Built with on top of