Advances in Large Language Model Transparency and Accountability

The field of large language models (LLMs) is rapidly evolving, with a growing focus on transparency and accountability. Recent research has highlighted the need for innovative solutions to address issues such as token count inflation, authorship privacy, and model ownership verification. One of the key directions in this area is the development of verification frameworks that can audit the quantity and semantic validity of hidden tokens in commercial LLM APIs. Additionally, there is a growing interest in watermarking techniques that can ensure the provenance and accountability of AI-generated text. Another important area of research is the analysis of interwoven roles of LLMs in authorship privacy, including authorship obfuscation, mimicking, and verification. Noteworthy papers in this regard include CoIn, which proposes a verification framework to detect token count inflation, and Invisible Entropy, which introduces a lightweight feature extractor and an entropy tagger to predict whether the entropy of the next token is high or low. Furthermore, papers such as Unraveling Interwoven Roles of Large Language Models in Authorship Privacy and Can Large Language Models Really Recognize Your Name? highlight the importance of understanding the limitations and potential risks of relying on LLMs for privacy-related tasks.

Sources

CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs

Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking

Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification

Can Large Language Models Really Recognize Your Name?

Trends and Challenges in Authorship Analysis: A Review of ML, DL, and LLM Approaches

Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering

DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection

Robust LLM Fingerprinting via Domain-Specific Watermarks

CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models

PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues

In-Context Watermarks for Large Language Models

Built with on top of