Advances in Model Attribution and Explainability

The field of model attribution and explainability is rapidly evolving, with a growing focus on developing innovative methods to verify the origin of model outputs, understand the influence of individual training samples, and provide faithful explanations for deep neural networks. Recent research has explored the use of fingerprinting techniques, influence functions, and spectral analysis to advance the field. Notably, new frameworks have been proposed to efficiently estimate data attribution and provide verifiable proofs of data attribution, addressing critical trust issues in machine learning.

Some noteworthy papers in this area include: AuthPrint, which introduces a method to fingerprint generative models against malicious model providers, achieving near-zero false positive rates. Efficient Forward-Only Data Valuation, which proposes a scalable framework for data valuation in large language models and vision-language models, matching or outperforming gradient-based baselines. Efficiently Verifiable Proofs of Data Attribution, which presents an interactive verification paradigm for data attribution, providing formal completeness, soundness, and efficiency guarantees.

Advances in Model Attribution and Explainability

Sources