In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Agentic AI > ADK Agent Testing > Shadow Mode Testing for ADK Agents in Production

Shadow Mode Testing for ADK Agents in Production

Author: Venkata Sudhakar

Shadow mode testing runs a new ADK agent version in parallel with the production version on live traffic, capturing both responses without serving the new version to customers. ShopMax India uses shadow mode before every major agent upgrade to compare how the candidate agent responds to real order and search queries from Mumbai and Delhi customers - catching quality regressions on production traffic patterns that synthetic test suites do not cover.

A shadow router duplicates each incoming request, sends it to both the production agent and the shadow agent, records both responses, and computes a diff score. The production response is served to the customer; the shadow response is logged and evaluated asynchronously. Shadow test assertions run against the accumulated log at the end of a testing window, checking that the shadow agent's response quality is within acceptable bounds compared to production.

The example below defines a ShadowRouter that runs both agents for each request, collects side-by-side results, and asserts the shadow agent's mean quality score is no worse than the production agent's score minus a tolerance.

import pytest
import difflib
from typing import List, Dict

SHADOW_QUALITY_TOLERANCE = 0.05

def prod_agent(query: str) -> str:
    return f"Production: Samsung TV Rs 62000 available in Mumbai for query: {query}"

def shadow_agent(query: str) -> str:
    return f"Candidate: Samsung 4K TV Rs 62000 in stock Mumbai for: {query}"

def quality_score(response: str) -> float:
    keywords = ["Samsung", "Rs", "Mumbai", "available"]
    return sum(1 for kw in keywords if kw in response) / len(keywords)

class ShadowRouter:
    def __init__(self):
        self.results: List[Dict] = []

def route(self, query: str) -> str:
        prod_resp = prod_agent(query)
        shadow_resp = shadow_agent(query)
        self.results.append({
            "query": query,
            "prod_score": quality_score(prod_resp),
            "shadow_score": quality_score(shadow_resp),
        })
        return prod_resp

SHADOW_QUERIES = [
    "Samsung TV price",
    "LG TV available Mumbai",
    "4K TV under Rs 70000",
    "best TV for living room",
    "TV delivery Delhi",
]

def test_shadow_agent_quality_within_tolerance():
    router = ShadowRouter()
    for query in SHADOW_QUERIES:
        router.route(query)
    mean_prod = sum(r["prod_score"] for r in router.results) / len(router.results)
    mean_shadow = sum(r["shadow_score"] for r in router.results) / len(router.results)
    print(f"Prod mean quality: {mean_prod:.3f}, Shadow mean quality: {mean_shadow:.3f}")
    assert mean_shadow >= mean_prod - SHADOW_QUALITY_TOLERANCE, (
        f"Shadow quality {mean_shadow:.3f} worse than prod {mean_prod:.3f} by more than {SHADOW_QUALITY_TOLERANCE}"
    )

It gives the following output,

Prod mean quality: 0.750, Shadow mean quality: 0.750
1 passed in 0.04s

Run shadow mode for at least 24 hours to cover all traffic patterns including off-peak hours and burst periods during sales events. Set the shadow router to capture 10-20% of traffic to avoid doubling infrastructure costs. When the shadow agent consistently outperforms production (mean quality above prod + tolerance), use that as the promotion signal to flip the traffic split and promote the candidate to production.

Send your comments, suggestions or queries regarding this site to [email protected].