In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Agentic AI > ADK Agent Testing > Load Testing ADK Agents - Simulating Concurrent User Requests

Load Testing ADK Agents - Simulating Concurrent User Requests

Author: Venkata Sudhakar

ShopMax India runs thousands of customer support sessions daily through ADK agents. A single agent response may perform well in isolation, but load testing reveals how the system behaves under concurrent user traffic - revealing latency spikes, resource contention, and failure rates that only appear at scale. Load testing ADK agents before a product launch prevents customer-facing outages on high-traffic days like sale events.

Load testing ADK agents uses asyncio to fire multiple parallel sessions simultaneously and measure aggregate throughput, latency, and failure rate. The key metrics are: requests per second (throughput), mean and p95 latency, and error rate under load. In unit and integration tests, the LLM is mocked to isolate the agent logic from network variability. In pre-production load tests, the real LLM endpoint is hit to measure end-to-end performance.

The example shows ShopMax India running a load test with 50 concurrent order tracking sessions. The agent call is mocked with a short simulated latency, and the test asserts that all sessions complete successfully within an acceptable time budget.

import asyncio
import time

async def simulate_agent_session(session_id, query, mock_response, latency_ms=10):
    await asyncio.sleep(latency_ms / 1000.0)
    return {"session_id": session_id, "response": mock_response, "success": True}

async def load_test(query, mock_response, num_concurrent=50):
    start = time.time()
    tasks = [
        simulate_agent_session("session_" + str(i), query, mock_response)
        for i in range(num_concurrent)
    ]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    elapsed = time.time() - start

successes = [r for r in results if isinstance(r, dict) and r["success"]]
    failures = [r for r in results if isinstance(r, Exception)]

return {
        "total": num_concurrent,
        "successes": len(successes),
        "failures": len(failures),
        "elapsed_seconds": round(elapsed, 3),
        "throughput_rps": round(num_concurrent / elapsed, 1)
    }

async def test_load_50_concurrent_sessions():
    query = "Track order ORD-7821"
    mock_response = "Your order ORD-7821 has been dispatched from Bangalore."

result = await load_test(query, mock_response, num_concurrent=50)

assert result["successes"] == 50, "Not all sessions succeeded"
    assert result["failures"] == 0, "Some sessions failed under load"
    assert result["elapsed_seconds"] < 5.0, "Load test exceeded time budget"

print("Total sessions:", result["total"])
    print("Successes:", result["successes"])
    print("Failures:", result["failures"])
    print("Elapsed:", result["elapsed_seconds"], "seconds")
    print("Throughput:", result["throughput_rps"], "req/sec")

asyncio.run(test_load_50_concurrent_sessions())

It gives the following output,

Total sessions: 50
Successes: 50
Failures: 0
Elapsed: 0.052 seconds
Throughput: 961.5 req/sec

Run mocked load tests in CI to catch regressions in agent logic under concurrency - race conditions and shared state bugs only appear with parallel execution. For pre-production load tests, use a staging environment and target 2x expected peak traffic. Set a p95 latency budget (e.g., 3 seconds for order queries) and fail the test if it is exceeded. Use asyncio.Semaphore to cap concurrency during ramp-up tests to simulate gradual traffic increases rather than an instant spike.

Send your comments, suggestions or queries regarding this site to [email protected].