The field of multimodal models, particularly Large Audio-Language Models (LALMs), is moving towards addressing safety and reliability concerns. Researchers are exploring innovative approaches to mitigate harmful responses, ensure robustness under emotional variation, and develop frameworks for editing auditory attribute knowledge. A key direction is the application of human psychological principles, such as Dialectical Behavior Therapy, to regulate model responses. Another area of focus is the development of inference-time defense frameworks to safeguard LALMs against harmful inputs. Noteworthy papers include:
- Mitigating Harmful Erraticism in LLMs Through Dialectical Behavior Therapy Based De-Escalation Strategies, which proposes a novel framework for regulating chatbot responses.
- SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models, which introduces a benchmark for editing auditory attribute knowledge in LALMs.
- SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering, which presents a defense framework for LALMs that leverages text-derived refusal steering and decomposed safe-space ablation.