Advances in Unstructured Data Analysis and Tabular Reasoning

The field of unstructured data analysis and tabular reasoning is rapidly advancing, with a focus on developing innovative methods and systems to extract insights from complex and heterogeneous data. Recent research has emphasized the importance of large language models (LLMs) in extracting attributes from unstructured data and performing semantic query processing. Noteworthy papers in this area include Unstructured Data Analysis using LLMs: A Comprehensive Benchmark, which presents a comprehensive benchmark for evaluating unstructured data analysis systems, and DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries, which proposes an end-to-end paradigm for answering analytic queries in natural language on large-scale open-domain data. Other notable papers include REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval, which introduces a three-stage framework for efficient and high-fidelity multi-table retrieval, and TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data, which proposes a framework for complex numerical reasoning over tabular data.

Sources

Unstructured Data Analysis using LLMs: A Comprehensive Benchmark

DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries

Iterative Foundation Model Fine-Tuning on Multiple Rewards

REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval

S2Doc - Spatial-Semantic Document Format

UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data

SemBench: A Benchmark for Semantic Query Processing Engines

Automated Reward Design for Gran Turismo

TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

EasyTUS: A Comprehensive Framework for Fast and Accurate Table Union Search across Data Lakes

Relational Deep Dive: Error-Aware Queries Over Unstructured Data

RUST-BENCH: Benchmarking LLM Reasoning on Unstructured Text within Structured Tables

Are We Asking the Right Questions? On Ambiguity in Natural Language Queries for Tabular Data Analysis

Built with on top of