|
|
Snapshot Testing ADK Agent Conversation Flows
Author: Venkata Sudhakar
ShopMax India's customer support agents handle multi-turn conversations - a customer asks about an order, follows up on delivery, and requests a change. When the agent prompt or model changes, individual response unit tests may still pass while the overall conversation flow subtly shifts. Snapshot testing records the full sequence of agent outputs for a known conversation and fails the test if any turn changes, catching conversational regressions that single-response tests miss.
Snapshot testing works by running a full conversation through the agent with a mocked LLM, serialising each turn's output to a JSON file, and storing it as the golden snapshot. On subsequent test runs, the current output is compared against the stored snapshot. Any change - different wording, missing field, altered order - fails the test and requires explicit snapshot update approval. This is particularly valuable after prompt engineering changes where the happy-path unit tests pass but the conversation tone or flow changes.
The example shows ShopMax India recording a 3-turn order tracking conversation as a snapshot and verifying it on replay. The first run creates the snapshot file; subsequent runs compare against it.
It gives the following output,
Snapshot created: order_tracking_flow
Snapshot matched: 3 turns verified
Store snapshot files in version control so changes are reviewed in pull requests - a snapshot diff is a clear signal that agent behavior changed. Add a --update-snapshots flag to your test runner for intentional updates: when a prompt change is deliberate, update the snapshots and commit them together with the prompt change. Create separate snapshots per scenario (order tracking, returns, stock queries) rather than one large file. Run snapshot tests in CI after integration tests - they catch regressions that only appear across a full conversation arc.
|
|