Watermarking and Robustness in Large Language Models

The field of large language models (LLMs) is moving towards increased focus on watermarking and robustness, with researchers exploring various methods to detect and prevent unauthorized use of LLM-generated content. Novel approaches to watermarking, such as using mixtures and statistical-to-computational gaps, are being proposed to address the issue of undetectable and elementary watermarking schemes. Additionally, end-to-end models for logits-based watermarking are being developed to achieve a better balance between quality and robustness. However, concerns regarding dataset copyright evasion attacks and memorization in LLMs are also being addressed through the development of detection methods and attacks on watermarking mechanisms. Noteworthy papers in this area include:

An End-to-End Model For Logits Based Large Language Models Watermarking, which introduces a novel end-to-end logits perturbation method for watermarking LLM-generated text, achieving superior robustness and maintaining text quality.
Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks, which exposes a widely prevalent vulnerability in current watermarking algorithms and proposes a generic efficient paraphrasing attack to exploit this vulnerability.

Watermarking and Robustness in Large Language Models

Sources