The field of multimodal learning is experiencing significant growth, with a focus on improving the balance and sufficiency of learning across different modalities. Recent studies have highlighted the importance of addressing modality imbalance, including imbalanced modality missing rates and heterogeneous modality contributions, in order to achieve optimal performance in real-world clinical scenarios. To this end, novel frameworks such as Dynamic Modality-Aware Fusion Network (DMAF-Net) have been proposed, which adopt dynamic modality-aware fusion modules, synergistic relation distillation, and prototype distillation frameworks to enforce global-local feature alignment and ensure semantic consistency. Additionally, methods such as Data Remixing have been introduced to address modality laziness and modality clash when jointly training multimodal models, resulting in improved accuracy and robustness. In the context of medical image segmentation, techniques such as Cross-Modal Clustering-Guided Negative Sampling and Occlusion-aware Bilayer Modeling have shown promise in improving the effectiveness and robustness of segmentation models. Noteworthy papers include ContextLoss, which proposes a novel loss function to improve topological correctness in image segmentation, and SynPo, which boosts training-free few-shot medical segmentation via high-quality negative prompts. These studies demonstrate the potential for significant advancements in multimodal learning and medical image segmentation, with potential applications in digital diagnosis and clinical decision-making.