The field of generative AI is moving towards developing safer and more accountable models. Recent research has focused on addressing concerns around harmful content generation, knowledge leakage, and model unlearning. A key direction is the development of methods that can effectively remove designated concepts from pre-trained models, with a focus on continual unlearning and regularization techniques. Another area of interest is the use of large language models for semantic steganography, enabling the hiding of semantically rich information within images. Furthermore, researchers are exploring approaches to enhance safety through the aggregation of multiple generative models, leveraging consensus sampling algorithms to amplify safety guarantees. Noteworthy papers in this area include:
- Leak@k: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding, which introduces a new meta-evaluation metric to quantify knowledge leakage in large language models.
- S^2LM: Towards Semantic Steganography via Large Language Models, which presents a novel approach to hiding sentence-level messages within images using large language models.
- Consensus Sampling for Safer Generative AI, which proposes a consensus sampling algorithm to enhance safety by aggregating multiple generative models.