Advances in Model Attribution and Explainability

The field of model attribution and explainability is rapidly evolving, with a growing focus on developing innovative methods to verify the origin of model outputs, understand the influence of individual training samples, and provide faithful explanations for deep neural networks. Recent research has explored the use of fingerprinting techniques, influence functions, and spectral analysis to advance the field. Notably, new frameworks have been proposed to efficiently estimate data attribution and provide verifiable proofs of data attribution, addressing critical trust issues in machine learning.

Some noteworthy papers in this area include: AuthPrint, which introduces a method to fingerprint generative models against malicious model providers, achieving near-zero false positive rates. Efficient Forward-Only Data Valuation, which proposes a scalable framework for data valuation in large language models and vision-language models, matching or outperforming gradient-based baselines. Efficiently Verifiable Proofs of Data Attribution, which presents an interactive verification paradigm for data attribution, providing formal completeness, soundness, and efficiency guarantees.

Sources

AuthPrint: Fingerprinting Generative Models Against Malicious Model Providers

Revisiting Data Attribution for Influence Functions

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

Efficient Forward-Only Data Valuation for Pretrained LLMs and VLMs

On the Complexity-Faithfulness Trade-off of Gradient-Based Explanations

On Spectral Properties of Gradient-based Explanation Methods

Efficiently Verifiable Proofs of Data Attribution

Built with on top of