Explainability and Interpretability in Machine Learning

The field of machine learning is moving towards increased explainability and interpretability, with a focus on developing methods that can provide insights into the decision-making processes of complex models. Recent research has explored the use of tensor networks, Shapley values, and counterfactual explanations to improve the transparency and understanding of machine learning models. These approaches have shown promising results in various applications, including computer vision and conservation monitoring. Notably, the development of tractable Shapley values and interactions via tensor networks has enabled efficient computation of explanations for large models. Furthermore, the integration of explainability methods with existing models, such as the use of caption-driven explainability with CLIP, has demonstrated the potential for improving model robustness and trustworthiness.

Noteworthy papers include: Additive Models Explained: A Computational Complexity Approach, which challenges the hypothesis that obtaining meaningful explanations for Generalized Additive Models can be performed efficiently. SHAP Meets Tensor Networks: Provably Tractable Explanations with Parallelism, which introduces a general framework for computing provably exact SHAP explanations for general Tensor Networks. FaCT: Faithful Concept Traces for Explaining Neural Network Decisions, which proposes a new model with model-inherent mechanistic concept-explanations that are faithful to the model.

Explainability and Interpretability in Machine Learning

Sources