Advances in Controllable Language Model Generation

The field of large language models (LLMs) is moving towards more controllable and reliable generation. Recent developments focus on improving attribute alignment, precise attribute intensity control, and adaptive intervention methods. These advancements enable more fine-grained control over LLM outputs, making them more suitable for real-world applications. Noteworthy papers include: PIXEL, which proposes a position-wise activation steering framework for reliable behavior control, and Precise Attribute Intensity Control, which enables fine-grained control over attribute intensities. In-Distribution Steering and Mechanistic Error Reduction with Abstention also demonstrate promising results in balancing control and coherence, and mitigating errors through selective interventions. Language steering in latent space is another area of research that shows potential in mitigating unintended code-switching in multilingual LLMs.

Sources

PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration

Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing

In-Distribution Steering: Balancing Control and Coherence in Language Model Generation

To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models

Language steering in latent space to mitigate unintended code-switching

Built with on top of