Advances in Safe and Reliable Large Language Models

The field of large language models is moving towards developing safer and more reliable models. Recent research has highlighted the importance of addressing cross-modal contexts, mitigating modal imbalance, and ensuring robustness to adversarial attacks. Studies have shown that current models can be easily jailbroken or over-refuse harmless inputs, and that they often prioritize certain modalities over others. To overcome these limitations, researchers are exploring new approaches such as certifiable safe reinforcement learning, survival analysis, and consequence-aware reasoning. These innovative methods aim to improve the safety and reliability of large language models, enabling them to better reason about links between actions and outcomes, and to provide more trustworthy outputs. Noteworthy papers include: Mitigating Modal Imbalance in Multimodal Reasoning, which demonstrates the importance of addressing cross-modal attention imbalance; Time-To-Inconsistency, which presents a comprehensive survival analysis of conversational AI robustness; and SaFeR-VLM, which proposes a safety-aligned reinforcement learning framework for multimodal models.

Advances in Safe and Reliable Large Language Models

Sources