Steganography and Watermarking in Text Generation

The field of text generation is moving towards more robust and secure methods of steganography and watermarking. Researchers are exploring new techniques to embed hidden information in text, such as using Unicode steganography, and to track the provenance of AI-generated text. One of the key challenges in this area is addressing tokenization inconsistency, which can undermine the robustness of steganography and watermarking methods. To overcome this, researchers are proposing tailored solutions, such as stepwise verification methods and post-hoc rollback methods. Another important aspect is the development of more robust watermarking methods that can resist real-world tampering, such as meaning-preserving attacks. Noteworthy papers in this area include:

  • A paper that proposes a hybrid framework combining semantic alignment strength with probabilistic watermarking, improving watermark recovery by an average of 11.1% in F1 score.
  • A paper that presents the winning solution to the NeurIPS 2024 Invisible Watermark Removal Challenge, achieving near-perfect watermark removal with negligible impact on image quality.

Sources

Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution

Robustness Assessment and Enhancement of Text Watermarking for Google's SynthID

Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models

First-Place Solution to NeurIPS 2024 Invisible Watermark Removal Challenge

Built with on top of