In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Agentic AI > ADK Agent Testing > Snapshot Testing ADK Agent Conversation Flows

Snapshot Testing ADK Agent Conversation Flows

Author: Venkata Sudhakar

ShopMax India's customer support agents handle multi-turn conversations - a customer asks about an order, follows up on delivery, and requests a change. When the agent prompt or model changes, individual response unit tests may still pass while the overall conversation flow subtly shifts. Snapshot testing records the full sequence of agent outputs for a known conversation and fails the test if any turn changes, catching conversational regressions that single-response tests miss.

Snapshot testing works by running a full conversation through the agent with a mocked LLM, serialising each turn's output to a JSON file, and storing it as the golden snapshot. On subsequent test runs, the current output is compared against the stored snapshot. Any change - different wording, missing field, altered order - fails the test and requires explicit snapshot update approval. This is particularly valuable after prompt engineering changes where the happy-path unit tests pass but the conversation tone or flow changes.

The example shows ShopMax India recording a 3-turn order tracking conversation as a snapshot and verifying it on replay. The first run creates the snapshot file; subsequent runs compare against it.

import json
import os

SNAPSHOT_DIR = "test_snapshots"

def load_snapshot(name):
    path = os.path.join(SNAPSHOT_DIR, name + ".json")
    if not os.path.exists(path):
        return None
    with open(path) as f:
        return json.load(f)

def save_snapshot(name, data):
    os.makedirs(SNAPSHOT_DIR, exist_ok=True)
    path = os.path.join(SNAPSHOT_DIR, name + ".json")
    with open(path, "w") as f:
        json.dump(data, f, indent=2)

def simulate_conversation(turns, mock_responses):
    results = []
    for i, turn in enumerate(turns):
        response = mock_responses[i] if i < len(mock_responses) else "No response"
        results.append({"turn": i + 1, "user": turn, "agent": response})
    return results

TURNS = [
    "Track my order ORD-7821",
    "When will it arrive in Mumbai?",
    "Can I change the delivery address?"
]

MOCK_RESPONSES = [
    "Your order ORD-7821 has been dispatched from Bangalore.",
    "It will arrive in Mumbai by tomorrow evening.",
    "Address changes are not possible after dispatch."
]

def test_conversation_matches_snapshot():
    actual = simulate_conversation(TURNS, MOCK_RESPONSES)
    name = "order_tracking_flow"
    existing = load_snapshot(name)

if existing is None:
        save_snapshot(name, actual)
        print("Snapshot created: " + name)
        return

assert actual == existing, "Conversation flow changed from snapshot"
    print("Snapshot matched: " + str(len(actual)) + " turns verified")

test_conversation_matches_snapshot()
test_conversation_matches_snapshot()

It gives the following output,

Snapshot created: order_tracking_flow
Snapshot matched: 3 turns verified

Store snapshot files in version control so changes are reviewed in pull requests - a snapshot diff is a clear signal that agent behavior changed. Add a --update-snapshots flag to your test runner for intentional updates: when a prompt change is deliberate, update the snapshots and commit them together with the prompt change. Create separate snapshots per scenario (order tracking, returns, stock queries) rather than one large file. Run snapshot tests in CI after integration tests - they catch regressions that only appear across a full conversation arc.

Send your comments, suggestions or queries regarding this site to [email protected].