Advancements in Mixture of Experts Architectures

The field of Mixture of Experts (MoE) architectures is rapidly advancing, with a focus on improving scalability, efficiency, and performance. Recent developments have introduced novel routing mechanisms, expert merging strategies, and model-system co-designs, enabling more effective and adaptive MoE models. These advancements have shown significant improvements in various tasks, including large-scale recommendation, language modeling, and vision-language tasks. Notably, the integration of MoE with other techniques, such as graph structures and Nash bargaining, has led to more robust and efficient models.

Some noteworthy papers in this area include: MTmixAtt, which proposes a unified MoE architecture with Multi-Mix Attention for large-scale recommendation tasks, achieving superior performance and real-world impact. ReXMoE, which introduces a novel MoE architecture that allows routers to reuse experts across adjacent layers, enabling richer expert combinations and improved performance. MoE-Prism, which transforms rigid MoE models into elastic services through model-system co-design, providing over 4 times more distinct operating points and enabling dynamic improvement of throughput and latency.

Sources

MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation

Reviving, reproducing, and revisiting Axelrod's second tournament

Mixture of Experts Approaches in Dense Retrieval Tasks

Expert Merging in Sparse Mixture of Experts with Nash Bargaining

Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures

Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts

ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts

3D Optimization for AI Inference Scaling: Balancing Accuracy, Cost, and Latency

MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs

Built with on top of