Advancements in Data Analysis and Scholarly Document Processing

The field of data analysis and scholarly document processing is rapidly evolving, with a focus on developing innovative methods and tools to improve the accessibility, interpretability, and reproducibility of research. Recent developments have centered around the use of large language models (LLMs) and agent-based techniques to enhance data understanding, natural language interfaces, and semantic analysis. The integration of LLMs with data visualization tools has also democratized data analysis, making it more intuitive and accessible to non-technical users. Furthermore, the development of modular, component-based architectures for AI agents has enabled the creation of transparent, evaluable, and accessible data agents that can bridge the gap between natural language interfaces and complex enterprise data warehouses. Noteworthy papers in this area include: A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs, which introduces a systematic methodology and a foundational dataset for citation intent classification in Turkish. VizGen: Data Exploration and Visualization from Natural Language via a Multi-Agent AI Architecture, which presents an AI-assisted graph generation system that empowers users to create meaningful visualizations using natural language. Experiversum: an Ecosystem for Curating and Enhancing Data-Driven Experimental Science, which introduces a lakehouse-based ecosystem that supports the curation, documentation, and reproducibility of exploratory experiments.

Sources

A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs

NFDI4DS Shared Tasks for Scholarly Document Processing

VizGen: Data Exploration and Visualization from Natural Language via a Multi-Agent AI Architecture

LLM/Agent-as-Data-Analyst: A Survey

Transparent, Evaluable, and Accessible Data Agents: A Proof-of-Concept Framework

Overview of SCIDOCA 2025 Shared Task on Citation Prediction, Discovery, and Placement

Experiversum: an Ecosystem for Curating and Enhancing Data-Driven Experimental Science

The Grammar of FAIR: A Granular Architecture of Semantic Units for FAIR Semantics, Inspired by Biology and Linguistics

IoDResearch: Deep Research on Private Heterogeneous Data via the Internet of Data

Built with on top of