Advances in Synthetic Data Generation for Healthcare

The field of synthetic data generation for healthcare is rapidly advancing, with a focus on developing innovative methods to address the challenges of scarce and biased data. Recent research has explored the use of neural networks, latent diffusion models, and data-centric frameworks to generate high-quality synthetic data for various healthcare applications, including medical coding, disease diagnosis, and clinical note generation. These approaches have shown promising results, improving the accuracy and equity of medical coding, enhancing the controllability and interpretability of generated data, and bridging the gap in data scarcity for rare diseases. Notable papers in this area include:

  • Beyond One-Size-Fits-All: Neural Networks for Differentially Private Tabular Data Synthesis, which proposes a novel neural network-based approach for differentially private tabular data synthesis.
  • H-LDM: Hierarchical Latent Diffusion Models for Controllable and Interpretable PCG Synthesis from Clinical Metadata, which introduces a hierarchical latent diffusion model for generating clinically accurate and controllable PCG signals from structured metadata.

Sources

Beyond One-Size-Fits-All: Neural Networks for Differentially Private Tabular Data Synthesis

Syn-STARTS: Synthesized START Triage Scenario Generation Framework for Scalable LLM Evaluation

Synthetic Clinical Notes for Rare ICD Codes: A Data-Centric Framework for Long-Tail Medical Coding

H-LDM: Hierarchical Latent Diffusion Models for Controllable and Interpretable PCG Synthesis from Clinical Metadata

How to Train Private Clinical Language Models: A Comparative Study of Privacy-Preserving Pipelines for ICD-9 Coding

Built with on top of