|
|
Testing ADK Agent Graceful Degradation When Tools Are Unavailable
Author: Venkata Sudhakar
Graceful degradation is the ability of a system to continue operating in a reduced capacity when one or more of its components fail. For ShopMax India's ADK agents, this means the agent should still serve customers even when tools like inventory lookup, payment gateway, or shipping API become unavailable. Testing graceful degradation ensures the agent handles tool failures without crashing or returning unhelpful errors.
ADK agents implement graceful degradation through fallback strategies: returning cached data, using alternative tools, providing partial responses, or gracefully informing users of limited service. The key is that the agent should degrade predictably and transparently. Testing focuses on verifying that each degraded mode activates correctly and that users receive meaningful responses rather than raw exceptions.
The following example tests ShopMax India's product inquiry agent when the inventory tool and pricing tool are unavailable. It verifies the agent falls back to cached catalog data and informs users about limited availability:
It gives the following output,
test_graceful_degradation_inventory_unavailable PASSED
test_graceful_degradation_all_tools_unavailable PASSED
test_degraded_response_contains_no_technical_details PASSED
3 passed in 6.41s
Degraded response sample: I was unable to retrieve live stock and pricing information for
the Philips 7kg Washing Machine (PHI-WM-7) as our systems are currently under maintenance.
Based on our catalog, the base price is Rs 22,000. For current stock availability,
please call 1800-shopmax or visit your nearest ShopMax India store in Mumbai or Bangalore.
Production guidance for ShopMax India: define a fallback priority chain - live inventory first, then cached catalog, then a generic apology message with helpline number. Monitor the degraded mode activation rate in Grafana; a spike means a downstream outage, not a test failure. Always test the fallback chain with real tool timeout values, not just immediate errors, since network timeouts are the most common real-world failure mode.
|
|