Advances in Explainable AI and Interpretable Machine Learning

The field of artificial intelligence is moving towards greater transparency and accountability, with a growing focus on explainable AI and interpretable machine learning. Recent research has made significant progress in developing techniques to provide insights into the decision-making processes of complex models, including language models and deep learning systems. One notable direction is the use of symbolic regression and genetic programming to discover compact and interpretable formulas that describe given data. Another area of research is the development of methods for analyzing and explaining the internal representations of large language models, including the use of vector symbolic architectures and contrastive explanations. Additionally, there is a growing interest in exploring alternatives to traditional next token prediction approaches in text generation, such as plan-then-generate and latent reasoning methods. Noteworthy papers in this area include 'From Embeddings to Equations: Genetic-Programming Surrogates for Interpretable Transformer Classification', which proposes a novel approach to obtaining compact and auditable classifiers with calibrated probabilities, and 'Query Circuits: Explaining How Language Models Answer User Prompts', which introduces a new method for tracing the information flow inside a model to explain its output. Overall, the field is moving towards a greater emphasis on transparency, accountability, and interpretability, with significant implications for the development of trustworthy and reliable AI systems.

Sources

From Embeddings to Equations: Genetic-Programming Surrogates for Interpretable Transformer Classification

Domain-Informed Genetic Superposition Programming: A Case Study on SFRC Beams

Towards Transparent AI: A Survey on Explainable Language Models

Beyond Formula Complexity: Effective Information Criterion Improves Performance and Interpretability for Symbolic Regression

Concept-SAE: Active Causal Probing of Visual Model Behavior

Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?

Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks

Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement

Alternatives To Next Token Prediction In Text Generation - A Survey

Query Circuits: Explaining How Language Models Answer User Prompts

Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

Language Model Planning from an Information Theoretic Perspective

Causal Autoencoder-like Generation of Feedback Fuzzy Cognitive Maps with an LLM Agent

Explaining novel senses using definition generation with open language models

The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models

o-MEGA: Optimized Methods for Explanation Generation and Analysis

Analyzing Latent Concepts in Code Language Models

Mechanistic Interpretability as Statistical Estimation: A Variance Analysis of EAP-IG

Interpreting Language Models Through Concept Descriptions: A Survey

Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention