Adversarial Robustness in Machine Learning

The field of machine learning is moving towards a deeper understanding of adversarial robustness, with a focus on the transferability of attacks between models. Recent research has highlighted the importance of considering the operational domain of attacks, whether in the input data-space or model representation space, in order to build more robust models. The distinction between these two domains has been shown to be critical in determining the success of adversarial transfer. Furthermore, the development of new attack methods, such as zero-query black-box attacks, has emphasized the vulnerabilities of employing deep neural networks in real-world contexts. Noteworthy papers in this area include:

  • Merge Now, Regret Later, which challenges the prevailing notion of model merging conferring free adversarial robustness and reveals key insights for machine-learning practitioners.
  • ZQBA, which proposes a zero-query black-box adversarial attack that exploits the representations of deep neural networks to fool other networks.
  • Understanding Adversarial Transfer, which provides theoretical and empirical evidence for the distinction between attacks in the input data-space and model representation space.

Sources

Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability

ZQBA: Zero Query Black-box Adversarial Attack

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

Built with on top of