The field of large language models is moving towards more efficient and scalable architectures, with a focus on Mixture-of-Experts (MoE) models. These models enable sparse parameter activation, reducing computational demands while scaling model size. Recent developments have introduced novel MoE architectures, such as those incorporating adjugate experts, hierarchical token deduplication, and expert swap techniques, which accelerate training and improve performance. Additionally, researchers are exploring the application of MoE models to multimodal tasks and vision-language models, demonstrating their effectiveness and efficiency. Noteworthy papers include EC2MoE, which proposes an adaptive framework for scalable MoE inference, and MoIIE, which introduces a mixture of intra- and inter-modality experts for large vision-language models. These advancements are expected to drive further innovation in the development of efficient and powerful large language models.