The field of large language models is moving towards improving safety and control, with a focus on developing innovative methods to mitigate potential risks and vulnerabilities. Recent research has explored various approaches, including multimodal prompt decoupling attacks, adaptive subspace steering, and backdoor attribution, to name a few. These advancements aim to enhance the reliability and trustworthiness of large language models, ensuring their safe deployment in real-world applications. Noteworthy papers in this area include Multimodal Prompt Decoupling Attack, which proposes a novel attack method to bypass safety filters, and Backdoor Attribution, which introduces a framework to elucidate and control backdoor mechanisms in language models. Additionally, papers like SafeSteer and ASGuard have made significant contributions to the development of efficient defense mechanisms against jailbreak attacks.