Advances in Language Model Optimization and Safety

The field of language models is moving towards more efficient and transparent optimization techniques, with a focus on scalability and interpretability. Recent developments have introduced novel frameworks that merge the strengths of white-box and black-box approaches, enabling more accurate and adaptable instruction optimization. Additionally, there is a growing emphasis on ensuring the safety and reliability of language models, with research into verifiable AI safety benchmarks, coverage-guided testing, and provable privacy and generalization bounds. Noteworthy papers include Instruction Learning Paradigms, which presents a novel framework for optimizing instructions for large language models, and Attestable Audits, which proposes a method for running verifiable AI safety benchmarks using trusted execution environments. Other notable works include Probing Evaluation Awareness of Language Models, which studies the ability of language models to distinguish between testing and deployment phases, and VeFIA, which introduces a framework for auditing the execution correctness of inference software in vertical federated learning.

Sources

Instruction Learning Paradigms: A Dual Perspective on White-box and Black-box LLMs

HF-DGF: Hybrid Feedback Guided Directed Grey-box Fuzzing

Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

Coverage-Guided Testing for Deep Learning Models: A Comprehensive Survey

Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training

Probing Evaluation Awareness of Language Models

VeFIA: An Efficient Inference Auditing Framework for Vertical Federated Collaborative Software

Built with on top of