|
|
Performance Benchmarking ADK Agents with SLA Reporting
Author: Venkata Sudhakar
Performance benchmarking with SLA reporting ensures that ADK agents meet latency budgets under realistic load. ShopMax India defines SLAs for its search and order agents - p95 tool latency must stay below 300ms and p99 below 500ms during Diwali sale peaks in Mumbai and Delhi - and runs benchmark tests in CI to catch regressions before they reach production.
The pytest-benchmark plugin provides a benchmark fixture that measures call latency, throughput, and statistics across multiple rounds. SLA assertions are then applied to the benchmark result object: benchmark.stats.percentiles[95] gives p95 latency in seconds. For ADK agents, benchmark the tool function directly since LLM calls are mocked, isolating the business logic latency from network variance.
The example below benchmarks a product search tool across 50 rounds, computes p95 and p99 from the collected timings using the statistics module, and asserts both values stay within the ShopMax India SLA thresholds.
It gives the following output,
p95=3.21ms p99=3.45ms (SLA: p95<300ms p99<500ms)
1 passed in 0.31s
Store SLA thresholds in a central config file so they can be tightened as the system matures. Run benchmarks in a dedicated CI job with a fixed machine type to avoid flaky results from shared runners. Export timing data as JSON and feed it into a Grafana dashboard so that SLA trends are visible across releases and load patterns in production can be correlated with CI benchmark history.
|
|