Advances in Mixture-of-Experts and Interpretable Machine Learning

The field of machine learning is witnessing significant developments in the areas of Mixture-of-Experts (MoE) and interpretable machine learning. Recent studies have focused on understanding the expressive power of MoEs in modeling complex tasks, revealing their ability to efficiently approximate functions supported on low-dimensional manifolds and piecewise functions with compositional sparsity. Furthermore, research has explored the interpretability of MoE models, proposing methods such as cross-level attribution algorithms to analyze sparse MoE architectures. Additionally, the development of sparse autoencoders (SAEs) has continued, with innovations like HierarchicalTopK training objectives and matching pursuit-based approaches, enabling the extraction of correlated features and improving interpretability. Other notable advancements include the introduction of nonlinear interpretable models, such as NIMO, which combines the benefits of neural networks and linear models, and the application of machine learning-based methods for QPI kernel extraction. Noteworthy papers include: On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks, which provides a systematic study of the expressive power of MoEs. Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy, which introduces a novel training objective for SAEs. Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis, which proposes a cross-level attribution algorithm for MoE models.

Sources

On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks

Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning

Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy

Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis

Disentangling Granularity: An Implicit Inductive Bias in Factorized VAEs

Surrogate Interpretable Graph for Random Decision Forests

ToothForge: Automatic Dental Shape Generation using Synchronized Spectral Embeddings

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Multi-Exit Kolmogorov-Arnold Networks: enhancing accuracy and parsimony

Iterative Neural Rollback Chase-Pyndiah Decoding

Sparse Autoencoders, Again?

NIMO: a Nonlinear Interpretable MOdel

Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit

Seeing the Invisible: Machine learning-Based QPI Kernel Extraction via Latent Alignment

Built with on top of