Advances in Robustness and Interpretability of Deep Learning Models

The field of deep learning is moving towards developing more robust and interpretable models. Researchers are focusing on improving the reliability and performance of machine learning models by proposing innovative methods for data attribution, model safety evaluation, and explanation stability. One key direction is the use of influence functions to efficiently approximate the impact of removing specific training trajectories on both learned system dynamics and downstream control performance. Another area of research is the development of frameworks that combine formal robustness verification, attribution entropy, and explanation stability to reveal critical mismatches between model accuracy and interpretability. Furthermore, there is a growing interest in using strategic games as a natural evaluation environment to assess the internal processes of large language models, such as planning, revision, and decision making under resource constraints. Noteworthy papers include: Model Discovery and Graph Simulation, which proposes a lightweight alternative to chaos engineering for ensuring resilience in microservice applications. Influence Functions for Data Attribution in Linear System Identification and LQR Control, which introduces a framework using influence functions to efficiently approximate the impact of removing specific training trajectories. Pixel-level Certified Explanations via Randomized Smoothing, which guarantees pixel-level robustness for any black-box attribution method using randomized smoothing.

Sources

Model Discovery and Graph Simulation: A Lightweight Alternative to Chaos Engineering

Influence Functions for Data Attribution in Linear System Identification and LQR Control

Deception Against Data-Driven Linear-Quadratic Control

A Comparative Analysis of Influence Signals for Data Debugging

Tracing LLM Reasoning Processes with Strategic Games: A Framework for Planning, Revision, and Resource-Constrained Decision Making

ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture

TriGuard: Testing Model Safety with Attribution Entropy, Verification, and Drift

Object-Centric Neuro-Argumentative Learning

Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks

Golden Partition Zone: Rethinking Neural Network Partitioning Under Inversion Threats in Collaborative Inference

Pixel-level Certified Explanations via Randomized Smoothing

Built with on top of