Advances in Tabular Data Analysis and Electronic Health Records

The field of data analysis is moving towards developing more robust and generalizable models for tabular data and electronic health records (EHRs). Researchers are exploring new architectures and techniques to improve the accuracy and reliability of models in these domains. A key direction is the development of foundation models that can handle a wide range of tasks, such as classification, regression, and data generation, without requiring task-specific training. Another important area of research is the use of large language models for schema inference and data analysis. Noteworthy papers include:

  • LimiX, which presents a large structured-data model that can handle a wide range of tabular tasks through query-based conditional prediction.
  • CEHR-GPT, which demonstrates strong performance across feature representation, zero-shot prediction, and synthetic data generation tasks for EHR data.

Sources

Robust Detection of Synthetic Tabular Data under Schema Variability

A Multi-target Bayesian Transformer Framework for Predicting Cardiovascular Disease Biomarkers during Pandemics

LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

CEHR-GPT: A Scalable Multi-Task Foundation Model for Electronic Health Records

ASCENDgpt: A Phenotype-Aware Transformer Model for Cardiovascular Risk Prediction from Electronic Health Records

Schema Inference for Tabular Data Repositories Using Large Language Models

Built with on top of