The field of data analysis is moving towards developing more robust and generalizable models for tabular data and electronic health records (EHRs). Researchers are exploring new architectures and techniques to improve the accuracy and reliability of models in these domains. A key direction is the development of foundation models that can handle a wide range of tasks, such as classification, regression, and data generation, without requiring task-specific training. Another important area of research is the use of large language models for schema inference and data analysis. Noteworthy papers include:
- LimiX, which presents a large structured-data model that can handle a wide range of tabular tasks through query-based conditional prediction.
- CEHR-GPT, which demonstrates strong performance across feature representation, zero-shot prediction, and synthetic data generation tasks for EHR data.