The field of large language models is moving towards improving robustness and security. Researchers are exploring various methods to mitigate issues such as in-context reward hacking, memorization, and adversarial attacks. Noteworthy papers in this area include Specification Self-Correction, which introduces a novel framework for identifying and correcting flaws in guiding specifications, and Strategic Deflection, which presents a defense against logit manipulation attacks by producing semantically adjacent responses that neutralize harmful intent. Other notable works include SDD, which encourages models to produce high-quality but irrelevant responses to harmful prompts, and Adversarial Defence without Adversarial Defence, which enhances robustness via instance-level principal component removal.