The field of reinforcement learning is moving towards addressing the vulnerability of deep reinforcement learning models to adversarial attacks in multi-agent systems. Recent developments focus on designing novel attack strategies that can effectively manipulate the behavior of victim agents without requiring direct interactions or full control over the environment. These approaches leverage the shared environment to indirectly influence the victim agents, demonstrating the potential for general and effective adversarial attacks in complex scenarios. Noteworthy papers in this area include:
- Neutral Agent-based Adversarial Policy Learning against Deep Reinforcement Learning in Multi-party Open Systems, which proposes a neutral agent-based approach to launch adversarial attacks in multi-party open systems.
- SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning, which introduces a framework that combines state and action perturbations to effectively attack multi-agent deep reinforcement learning models.
- Provably Invincible Adversarial Attacks on Reinforcement Learning Systems: A Rate-Distortion Information-Theoretic Approach, which presents a provably invincible type of adversarial attack that applies a rate-distortion information-theoretic approach to randomly change agents' observations.