In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Agentic AI > ADK Agent Testing > Blue/Green Deployment Testing for ADK Agents

Blue/Green Deployment Testing for ADK Agents

Author: Venkata Sudhakar

ShopMax India uses blue/green deployments to release new ADK agent versions with zero downtime - the green environment runs the new version while blue continues serving live traffic. Before cutting over, blue/green tests run the same query set against both environments and compare response quality, error rates, and latency. Only if green meets or exceeds blue on all metrics does the traffic shift proceed, protecting customers in Mumbai, Delhi, and Bangalore from a degraded experience.

A blue/green test for ADK agents runs a fixed benchmark query set against both the current (blue) and new (green) agent endpoints, collects metrics for each, and computes a promotion decision. Key metrics are error rate (must not increase), p95 latency (must not increase by more than 10 percent), and quality score (must not drop). Log the per-query comparison so engineers can investigate any individual regression before promoting.

The example below simulates blue and green agent endpoints for ShopMax India with slightly different response characteristics. Tests verify that green error rate does not exceed blue, that latency regression is within tolerance, and that the promotion decision correctly blocks a degraded green deployment.

import pytest
import time

BENCHMARK_QUERIES = ["Track ORD-4001", "Return ORD-4002", "Price of iPhone 15"]

def blue_agent(query):
    time.sleep(0.02)
    return {"response": "Blue: " + query[:20], "error": False, "latency": 0.02}

def green_agent_good(query):
    time.sleep(0.018)
    return {"response": "Green: " + query[:20], "error": False, "latency": 0.018}

def green_agent_degraded(query):
    time.sleep(0.04)
    return {"response": "", "error": True, "latency": 0.04}

def run_benchmark(agent_fn):
    results = []
    for q in BENCHMARK_QUERIES:
        r = agent_fn(q)
        results.append(r)
    error_rate = sum(1 for r in results if r["error"]) / len(results)
    avg_latency = sum(r["latency"] for r in results) / len(results)
    return {"error_rate": error_rate, "avg_latency": avg_latency, "count": len(results)}

def should_promote(blue_metrics, green_metrics, latency_tolerance=0.10):
    if green_metrics["error_rate"] > blue_metrics["error_rate"]:
        return False, "Green error rate higher than blue"
    latency_increase = (green_metrics["avg_latency"] - blue_metrics["avg_latency"]) / max(blue_metrics["avg_latency"], 0.001)
    if latency_increase > latency_tolerance:
        return False, "Green latency regression exceeds tolerance"
    return True, "Green meets promotion criteria"

def test_good_green_promotes():
    blue = run_benchmark(blue_agent)
    green = run_benchmark(green_agent_good)
    promote, reason = should_promote(blue, green)
    print("Blue: " + str(blue) + " Green: " + str(green) + " Decision: " + reason)
    assert promote is True

def test_degraded_green_blocked():
    blue = run_benchmark(blue_agent)
    green = run_benchmark(green_agent_degraded)
    promote, reason = should_promote(blue, green)
    print("Blocked: " + reason)
    assert promote is False

It gives the following output,

Blue: {'error_rate': 0.0, 'avg_latency': 0.02, 'count': 3} Green: {'error_rate': 0.0, 'avg_latency': 0.018, 'count': 3} Decision: Green meets promotion criteria
Blocked: Green error rate higher than blue
.. (2 passed in 0.18s)

In production, ShopMax India should automate the blue/green promotion decision in the CI/CD pipeline and require a human approval gate for any deployment during peak sale periods. Store blue metrics from the last 7 days so the comparison baseline is a rolling average rather than a single benchmark run, reducing sensitivity to natural traffic fluctuations. Roll back automatically if error rate on green exceeds 1 percent within the first 5 minutes after traffic shift, without waiting for a human to intervene.

Send your comments, suggestions or queries regarding this site to [email protected].