Privacy and Security in AI-Driven Genomics and Education

The field of AI-driven genomics and education is moving towards a greater emphasis on privacy and security. Recent research has highlighted the vulnerabilities of cognitive diagnosis models, generative machine learning, and large language models to various types of attacks, including membership inference attacks and data poisoning. To address these risks, researchers are exploring new techniques such as data augmentation, differential privacy, and fine-tuning to enhance the privacy and security of these models. Notably, fine-tuning has been shown to strengthen resistance to reconstruction attacks in large language models, while data augmentation can mitigate membership inference attacks in clinical time series forecasting.

Some noteworthy papers in this area include: P-MIA, a novel grey-box threat model that exploits the explainability features of cognitive diagnosis models to launch a potent attack. Associative Poisoning to Generative Machine Learning, which introduces a novel data poisoning technique that compromises fine-grained features of generated data without requiring control of the training process. Comparing Reconstruction Attacks on Pretrained Versus Full Fine-tuned Large Language Model Embeddings on Homo Sapiens Splice Sites Genomic Data, which demonstrates that fine-tuning strengthens resistance to reconstruction attacks in multiple architectures. Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models, which introduces a novel hybrid attack that combines traditional black box MIA with contextual genomics metrics for enhanced attack power.

Sources

P-MIA: A Profiled-Based Membership Inference Attack on Cognitive Diagnosis Models

Associative Poisoning to Generative Machine Learning

Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting

Comparing Reconstruction Attacks on Pretrained Versus Full Fine-tuned Large Language Model Embeddings on Homo Sapiens Splice Sites Genomic Data

Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models

Built with on top of