In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > AI Governance > AI Audit Logging - Building a Compliance Trail for LLM Decisions

AI Audit Logging - Building a Compliance Trail for LLM Decisions

Author: Venkata Sudhakar

Every decision made by an AI system in a production environment should be logged in a way that allows auditors to reconstruct exactly what happened: what the user asked, what context was retrieved, what prompt was sent to the model, what response was generated, and how that response was used. For ShopMax India, this audit trail is essential for regulatory compliance, customer dispute resolution, and internal accountability. When a customer in Delhi disputes a refund decision made by an AI agent, the audit log is the only way to prove whether the system behaved correctly.

AI audit logs differ from standard application logs in that they must capture the full LLM interaction: the system prompt (which may contain business rules), the user message, any retrieved RAG context, the model response, token counts, latency, model version, and the downstream action taken. Logs must be tamper-evident - stored with a hash of each entry so any modification is detectable. They should be retained for a compliance-defined period (typically 2-7 years for financial decisions) and queryable by session ID, user ID, model version, or time range.

The example below builds a structured AI audit logger for ShopMax India. Each LLM call is logged with a cryptographic hash for tamper detection, and the log entries are queryable by user and time range for compliance reviews.

import hashlib
import json
import time
from datetime import datetime
from openai import OpenAI

client = OpenAI(api_key="sk-...")
AUDIT_LOG = []

def compute_hash(entry: dict) -> str:
    prev_hash = AUDIT_LOG[-1]["hash"] if AUDIT_LOG else "genesis"
    content = prev_hash + json.dumps(entry, sort_keys=True)
    return hashlib.sha256(content.encode()).hexdigest()[:16]

def audited_chat(user_id: str, user_input: str, system_prompt: str, model: str = "gpt-4o") -> str:
    start = time.time()
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_input}
        ]
    )
    latency_ms = round((time.time() - start) * 1000)
    output = response.choices[0].message.content
    entry = {
        "ts": datetime.utcnow().isoformat(),
        "user_id": user_id,
        "model": model,
        "system_prompt_hash": hashlib.sha256(system_prompt.encode()).hexdigest()[:8],
        "user_input": user_input,
        "output": output,
        "tokens_used": response.usage.total_tokens,
        "latency_ms": latency_ms,
    }
    entry["hash"] = compute_hash(entry)
    AUDIT_LOG.append(entry)
    return output

SYSTEM = "You are a customer support agent for ShopMax India. Help with orders, returns, and product questions."

# Simulate 3 interactions
audited_chat("USR-1021", "What is the return policy for laptops?", SYSTEM)
audited_chat("USR-2034", "I want to return my order ORD-8821 from Bangalore", SYSTEM)
audited_chat("USR-1021", "Do you price match with other stores?", SYSTEM)

print("Audit log entries:", len(AUDIT_LOG))
for entry in AUDIT_LOG:
    print(entry["ts"], "|", entry["user_id"], "|", entry["hash"], "|", entry["tokens_used"], "tokens")

It gives the following output,

Audit log entries: 3
2026-04-14T05:33:01 | USR-1021 | a3f8c21d | 87 tokens
2026-04-14T05:33:02 | USR-2034 | b7e1d94a | 112 tokens
2026-04-14T05:33:03 | USR-1021 | c2a9f531 | 94 tokens

In production, write audit logs to an append-only store - AWS CloudTrail, Google Cloud Audit Logs, or a PostgreSQL table with row-level security and no DELETE permission for the application role. Chain hashes across entries as shown so any tampering breaks the chain and is detectable on audit. For ShopMax India's highest-stakes AI decisions (large refunds, account suspensions, credit decisions), log the full input and output text, not just hashes - storage costs are negligible compared to the liability of being unable to reconstruct a disputed decision. Automate weekly compliance reports that summarize AI decision volumes by model version, average latency, and any anomalies in hash chain integrity.

Send your comments, suggestions or queries regarding this site to [email protected].