Advances in Document Representation and Retrieval

The field of document representation and retrieval is moving towards more fine-grained and semantic approaches. Researchers are proposing novel methods to improve the accuracy and efficiency of document retrieval, such as multi-aspect-aware query optimization and semantic contrastive sentence embeddings. These approaches aim to capture the complex relationships between documents and improve the overall performance of retrieval systems. Notably, some studies are exploring the use of large language models to generate summaries and improve evaluation metrics. Overall, the field is shifting towards more sophisticated and context-aware representation and retrieval techniques. Noteworthy papers include: PRISM, which introduces a novel document-to-document retrieval method that improves performance by an average of 4.3% over existing baselines. SemCSE, which achieves state-of-the-art performance among models of its size on the SciRepEval benchmark for scientific text embeddings. Learning Robust Negation Text Representations, which proposes a strategy to improve negation robustness of text encoders and observes large improvement in negation understanding capabilities.

Sources

PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization

Extracting Document Relations from Search Corpus by Marginalizing over User Queries

Real-World Summarization: When Evaluation Reaches Its Limits

Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis

Learning Robust Negation Text Representations

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts

Built with on top of