Explainability and Security in AI-Assisted Decision Making

The field of AI-assisted decision making is moving towards increased transparency and trustworthiness, with a focus on explainability and security. Recent developments have highlighted the importance of evaluating and improving the robustness of Class Activation Maps (CAMs) and other explainability methods to noise and adversarial attacks. Additionally, the rise of Large Language Models (LLMs) has introduced new security threats, such as hidden prompt injection attacks, which can manipulate model outputs without user awareness or system compromise. Researchers are working to develop principled approaches to detect and mitigate these threats, including the use of robustness metrics and safe machine learning techniques. Noteworthy papers in this area include: PhantomLint, which presents a principled approach to detecting hidden LLM prompts in structured documents. Attacking LLMs and AI Agents, which introduces Advertisement Embedding Attacks as a new class of LLM security threats. Safer Skin Lesion Classification with Global Class Activation Probability Map Evaluation and SafeML, which proposes a method for evaluating and improving the reliability of skin lesion classification models.

Explainability and Security in AI-Assisted Decision Making

Sources