Advances in Interpretable and Controllable Language Models

The field of natural language processing is moving towards developing more interpretable and controllable language models. Recent research has focused on designing models that can provide insights into their decision-making processes and allow for finer control over their outputs. This shift is driven by the need for more transparent and trustworthy AI systems, particularly in applications where language models are used to generate content that can have significant social and cultural impact. Noteworthy papers in this area include iBERT, which introduces a novel approach to sense decomposition for interpretable embeddings, and BILLY, which proposes a training-free framework for steering large language models via merging persona vectors. Additionally, FlexAC presents a lightweight and training-free framework for modulating associative behavior in multimodal large language models, enabling flexible control over associative reasoning. These advancements have significant implications for the development of more sophisticated and responsible language models.

Sources

iBERT: Interpretable Style Embeddings via Sense Decomposition

Large Language Model Sourcing: A Survey

BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation

The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution

FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models

DocReward: A Document Reward Model for Structuring and Stylizing

Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models

When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection

StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis

LLM one-shot style transfer for Authorship Attribution and Verification

How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study

Attribution Quality in AI-Generated Content:Benchmarking Style Embeddings and LLM Judges

Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes