In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > AI Security > LLM Output Watermarking for AI Content Attribution

LLM Output Watermarking for AI Content Attribution

Author: Venkata Sudhakar

ShopMax India publishes AI-generated product descriptions, promotional content, and review summaries at scale. As AI-generated text becomes widespread, proving that a specific piece of content was generated by ShopMax India systems - rather than copied or fabricated by a third party - is valuable for brand protection and legal defensibility. LLM output watermarking embeds a hidden statistical signature into generated text that can be verified later without altering the visible content.

Soft watermarking biases the token sampling process using a secret key. Tokens are partitioned into green and red lists per position, and the model is nudged to prefer green-list tokens. The resulting text reads naturally to humans but has a statistically detectable excess of green tokens. Verification reruns the same partitioning and checks whether the green token ratio exceeds the expected random baseline - a strong signal that the text was watermarked.

The example below demonstrates a proof-of-concept watermarking scheme applied to ShopMax India product descriptions, embedding a verifiable signature and detecting it on output.

import openai
import hashlib
import os

client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY", ""))

SECRET_KEY = "shopmax-india-watermark-key-2025"
WATERMARK_CHAR = "\u200b"  # zero-width space (invisible)
WATERMARK_COUNT = 7

def generate_watermarked(prompt):
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a product description writer for ShopMax India."},
            {"role": "user", "content": prompt}
        ]
    )
    text = resp.choices[0].message.content
    # Embed invisible signature at end of text
    signature = WATERMARK_CHAR * WATERMARK_COUNT
    return text + signature

def verify_watermark(text):
    count = text.count(WATERMARK_CHAR)
    return count == WATERMARK_COUNT, count

def get_fingerprint(text):
    clean = text.replace(WATERMARK_CHAR, "")
    return hashlib.sha256((SECRET_KEY + clean[:50]).encode()).hexdigest()[:12]

text = generate_watermarked(
    "Write a 60-word description for Sony WH-1000XM5 headphones priced at Rs 24999 for ShopMax India."
)

is_marked, count = verify_watermark(text)
fingerprint = get_fingerprint(text)

print("Generated text:")
print(text.replace(WATERMARK_CHAR, "").strip())
print(f"\nWatermark detected: {is_marked} (signature count: {count})")
print(f"Content fingerprint: {fingerprint}")

It gives the following output,

Generated text:
Experience premium sound with the Sony WH-1000XM5 at ShopMax India for Rs 24,999.
Industry-leading noise cancellation and 30-hour battery life make these ideal
for commuters in Mumbai and Bangalore. Lightweight design with multipoint
connection for seamless device switching.

Watermark detected: True (signature count: 7)
Content fingerprint: 3a8f21c04d91

For production watermarking, use the Kirchenbauer et al. green/red token scheme which operates at the logit level rather than appending invisible characters - it is significantly harder to strip. Store the watermark secret in Google Cloud Secret Manager and rotate it quarterly. Maintain a log of which secret version was active for each batch of generated content so you can verify older outputs. Run watermark verification before accepting any externally submitted content claimed to originate from ShopMax India systems.

Send your comments, suggestions or queries regarding this site to [email protected].