Explainability and Transparency in AI Models

The field of artificial intelligence is moving towards a greater emphasis on explainability and transparency in AI models. Researchers are developing new methods and techniques to provide insights into the decision-making processes of AI systems, enabling more trustworthy and reliable interactions between humans and machines. One key area of focus is the development of counterfactual explanations, which provide alternative scenarios to help understand how AI models arrive at their decisions. Another area of research is the integration of language models with other techniques, such as graph-based methods, to improve the explainability of AI models. Noteworthy papers in this area include: Graph Style Transfer for Counterfactual Explainability, which introduces a novel framework for generating counterfactuals for graph data, and Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour, which proposes a method for explaining the behavior of multi-agent systems using counterfactual simulations and language models.

Sources

Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness

Graph Style Transfer for Counterfactual Explainability

Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour

Emerging categories in scientific explanations

Explaining Sources of Uncertainty in Automated Fact-Checking

Structured Thinking Matters: Improving LLMs Generalization in Causal Inference Tasks

Counterfactual Simulatability of LLM Explanations for Generation Tasks

Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents

LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference

Generalizability vs. Counterfactual Explainability Trade-Off

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation

Towards Explainable Sequential Learning

DiCoFlex: Model-agnostic diverse counterfactuals with flexible control