|
|
Canary Deployment Testing for ADK Agents
Author: Venkata Sudhakar
When ShopMax India deploys a new ADK agent version, rolling it out to all customers at once risks exposing every user to a regression if the new version has subtle quality issues. Canary deployment routes a small percentage of traffic to the new version while the rest goes to the stable version, collecting quality metrics from both. Automated canary testing compares the two versions and triggers an automatic rollback if the new version's metrics fall below the stable baseline.
The canary test runs a shared query set against both agent versions, collecting quality scores, error rates, and token counts. It then applies a rollback decision function that compares the canary metrics against the baseline. If the canary quality score drops by more than an acceptable threshold or the error rate exceeds the baseline, the test fails and signals that the deployment should be halted.
The example below simulates a ShopMax India canary deployment with a stable v1 and a new v2 agent, runs both against 5 queries, and asserts that v2 meets the quality and error rate thresholds required to proceed with the full rollout.
It gives the following output,
Stable quality: 0.8, errors: 0
Canary quality: 0.8, errors: 0
Quality drop: 0.0, Error delta: 0
. (1 passed in 0.01s)
In production, replace stable_v1 and canary_v2 with real ADK runner instances pointed at different model versions or prompt configurations. Route 5-10% of live ShopMax India traffic to the canary and collect metrics over at least 30 minutes before comparing. Set CANARY_THRESHOLD_DROP to 0.05 for quality-sensitive flows like refund processing, and wire the assert failure to a deployment pipeline gate that automatically rolls back the canary if the test fails.
|
|