Advances in Mixture-of-Experts and Interpretable Machine Learning

The field of machine learning is witnessing significant developments in the areas of Mixture-of-Experts (MoE) and interpretable machine learning. Recent studies have focused on understanding the expressive power of MoEs in modeling complex tasks, revealing their ability to efficiently approximate functions supported on low-dimensional manifolds and piecewise functions with compositional sparsity. Furthermore, research has explored the interpretability of MoE models, proposing methods such as cross-level attribution algorithms to analyze sparse MoE architectures. Additionally, the development of sparse autoencoders (SAEs) has continued, with innovations like HierarchicalTopK training objectives and matching pursuit-based approaches, enabling the extraction of correlated features and improving interpretability. Other notable advancements include the introduction of nonlinear interpretable models, such as NIMO, which combines the benefits of neural networks and linear models, and the application of machine learning-based methods for QPI kernel extraction. Noteworthy papers include: On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks, which provides a systematic study of the expressive power of MoEs. Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy, which introduces a novel training objective for SAEs. Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis, which proposes a cross-level attribution algorithm for MoE models.

Advances in Mixture-of-Experts and Interpretable Machine Learning

Sources