The field of missing data handling is witnessing significant developments, with a growing focus on innovative methods to impute and generate high-quality data. Researchers are exploring new approaches to tackle complex missingness patterns, leveraging techniques such as multi-task learning, masked autoencoding, and synthetic data generation. These advancements have the potential to improve the performance of machine learning models in various applications, including healthcare, marketing, and biomedical fields. Noteworthy papers in this area include:
- CACTI, which leverages copy masking and contextual information to improve tabular data imputation, achieving state-of-the-art results.
- The proposed agentic framework for missing modality prediction, which dynamically formulates modality-aware mining strategies and introduces a self-refinement mechanism to enhance generated modalities.
- LSM-2 with Adaptive and Inherited Masking, a novel self-supervised learning approach that learns robust representations directly from incomplete wearable sensor data.