The field of deep learning is moving towards developing more robust and interpretable models. Researchers are focusing on improving the reliability and performance of machine learning models by proposing innovative methods for data attribution, model safety evaluation, and explanation stability. One key direction is the use of influence functions to efficiently approximate the impact of removing specific training trajectories on both learned system dynamics and downstream control performance. Another area of research is the development of frameworks that combine formal robustness verification, attribution entropy, and explanation stability to reveal critical mismatches between model accuracy and interpretability. Furthermore, there is a growing interest in using strategic games as a natural evaluation environment to assess the internal processes of large language models, such as planning, revision, and decision making under resource constraints. Noteworthy papers include: Model Discovery and Graph Simulation, which proposes a lightweight alternative to chaos engineering for ensuring resilience in microservice applications. Influence Functions for Data Attribution in Linear System Identification and LQR Control, which introduces a framework using influence functions to efficiently approximate the impact of removing specific training trajectories. Pixel-level Certified Explanations via Randomized Smoothing, which guarantees pixel-level robustness for any black-box attribution method using randomized smoothing.
Advances in Robustness and Interpretability of Deep Learning Models
Sources
Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making
Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks