Advances in Table Understanding and Generation

The field of table understanding and generation is rapidly advancing, with a focus on developing more accurate and efficient methods for extracting and representing tabular data. Recent research has explored the use of large language models (LLMs) and neurosymbolic approaches to improve table extraction, generation, and reasoning. These methods have shown significant promise in handling complex tasks such as table retrieval, question answering, and data annotation. Notably, the use of LLMs has enabled the development of zero-shot and few-shot learning frameworks, which can adapt to new tasks and domains with minimal training data. Overall, the field is moving towards more robust and generalizable methods for table understanding and generation, with potential applications in a wide range of areas, including finance, healthcare, and scientific research. Noteworthy papers include: Fine-Tuning Vision-Language Models for Markdown Conversion of Financial Tables, which proposes a fine-tuned vision-language model for converting financial tables into Markdown format, achieving high accuracy and outperforming larger models. TEN: Table Explicitization, Neurosymbolically, which presents a neurosymbolic approach for extracting tabular data from semistructured input text, significantly outperforming purely neural baselines and achieving high exact match accuracy.

Sources

Fine-Tuning Vision-Language Models for Markdown Conversion of Financial Tables in Malaysian Audited Financial Reports

Improving Table Retrieval with Question Generation from Partial Tables

PanelTR: Zero-Shot Table Reasoning Framework Through Multi-Agent Scientific Discussion

Multi-Dimensional Summarization Agents with Context-Aware Reasoning over Enterprise Tables

Evaluating Large Language Models as Expert Annotators

TurQUaz at CheckThat! 2025: Debating Large Language Models for Scientific Web Discourse Detection

LLM driven Text-to-Table Generation through Sub-Tasks Guidance and Iterative Refinement

LyS at SemEval 2025 Task 8: Zero-Shot Code Generation for Tabular QA

LLM Empowered Prototype Learning for Zero and Few-Shot Tasks on Tabular Data

TEN: Table Explicitization, Neurosymbolically

Columbo: Expanding Abbreviated Column Names for Tabular Data Using Large Language Models

From Surface to Semantics: Semantic Structure Parsing for Table-Centric Document Analysis

Built with on top of