The field of multimodal learning is witnessing significant developments, with a focus on explainability and fairness. Researchers are exploring innovative methods to extract robust low-dimensional representations from high-dimensional multi-source data, mitigating systematic biases and promoting fairness. Another notable direction is the development of frameworks for benchmarking explainability methods, enabling stakeholders to select appropriate explanations tailored to their specific use cases. Furthermore, there is a growing interest in ensuring fairness in multimodal models, particularly in federated learning environments, with techniques such as fair prompt tuning and demographic subspace orthogonal projection being proposed. Noteworthy papers include StablePCA, which introduces a novel method for group distributionally robust learning of latent representations, and EvalxNLP, which provides a framework for benchmarking state-of-the-art feature attribution methods for transformer-based NLP models. Additionally, papers like Mitigating Group-Level Fairness Disparities in Federated Visual Language Models and Robust Fairness Vision-Language Learning for Medical Image Analysis demonstrate significant improvements in fairness and robustness for multimodal models.