|
|
Testing ADK Agent Rate Limit and Quota Exhaustion Handling
Author: Venkata Sudhakar
Rate limiting and quota exhaustion are common production scenarios where an ADK agent's underlying LLM or tool APIs reject requests due to excessive call volume. For ShopMax India, this can happen during flash sales in Mumbai and Bangalore when thousands of customers simultaneously query the order tracking agent, overwhelming the Gemini API quota. Testing rate limit handling ensures the agent retries intelligently and communicates delays to customers without failing outright.
ADK agents must handle three distinct rate limit scenarios: API quota exhaustion (HTTP 429 from Gemini), tool-level rate limits (third-party shipping or payment APIs), and burst throttling (too many requests per second). Each requires a different response strategy - exponential backoff for quota exhaustion, circuit breaking for sustained tool rate limits, and request queuing for burst throttling. Tests simulate these conditions using mock responses and verify the agent's retry behavior and user messaging.
The following example tests ShopMax India's order tracking agent under rate limit conditions. It verifies the agent retries with backoff on 429 responses and informs customers about delays during high-traffic periods:
It gives the following output,
test_agent_retries_on_rate_limit PASSED
test_agent_graceful_message_on_quota_exhaustion PASSED
test_response_time_acceptable_under_retry PASSED
3 passed in 9.87s
Response time with retries: 8.34s
Tool call attempts: 3
For ShopMax India's production deployments, set up rate limit monitoring dashboards that track 429 error rates per agent per hour. During planned events like Independence Day sales, pre-warm the quota by raising Gemini API limits in advance and implementing request queuing at the orchestration layer. In tests, always assert both the functional outcome (correct response content) and the non-functional outcome (retry count within limits, response time within SLA). Avoid mocking sleep in retry tests - actual elapsed time assertions catch real-world timeout issues that mocked sleeps hide.
|
|