In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Agentic AI > ADK Agent Testing > Performance Benchmarking ADK Agents with SLA Reporting

Performance Benchmarking ADK Agents with SLA Reporting

Author: Venkata Sudhakar

Performance benchmarking with SLA reporting ensures that ADK agents meet latency budgets under realistic load. ShopMax India defines SLAs for its search and order agents - p95 tool latency must stay below 300ms and p99 below 500ms during Diwali sale peaks in Mumbai and Delhi - and runs benchmark tests in CI to catch regressions before they reach production.

The pytest-benchmark plugin provides a benchmark fixture that measures call latency, throughput, and statistics across multiple rounds. SLA assertions are then applied to the benchmark result object: benchmark.stats.percentiles[95] gives p95 latency in seconds. For ADK agents, benchmark the tool function directly since LLM calls are mocked, isolating the business logic latency from network variance.

The example below benchmarks a product search tool across 50 rounds, computes p95 and p99 from the collected timings using the statistics module, and asserts both values stay within the ShopMax India SLA thresholds.

import time
import statistics
import pytest
from google.adk.tools import FunctionTool

SLA_P95_MS = 300
SLA_P99_MS = 500

def search_products(query: str, city: str = "Mumbai") -> list:
    time.sleep(0.002)  # Simulate 2ms DB lookup
    return [
        {"name": "Samsung 55-inch 4K TV", "price_rs": 62000, "city": city},
        {"name": "LG OLED 48-inch", "price_rs": 89000, "city": city},
    ]

def test_search_tool_meets_sla():
    tool = FunctionTool(search_products)
    timings = []
    rounds = 50
    for _ in range(rounds):
        start = time.perf_counter()
        search_products("4K TV", city="Bangalore")
        elapsed_ms = (time.perf_counter() - start) * 1000
        timings.append(elapsed_ms)
    timings.sort()
    p95 = timings[int(rounds * 0.95) - 1]
    p99 = timings[int(rounds * 0.99) - 1]
    print(f"p95={p95:.2f}ms p99={p99:.2f}ms (SLA: p95<{SLA_P95_MS}ms p99<{SLA_P99_MS}ms)")
    assert p95 < SLA_P95_MS, f"p95 SLA breach: {p95:.2f}ms"
    assert p99 < SLA_P99_MS, f"p99 SLA breach: {p99:.2f}ms"

It gives the following output,

p95=3.21ms p99=3.45ms (SLA: p95<300ms p99<500ms)
1 passed in 0.31s

Store SLA thresholds in a central config file so they can be tightened as the system matures. Run benchmarks in a dedicated CI job with a fixed machine type to avoid flaky results from shared runners. Export timing data as JSON and feed it into a Grafana dashboard so that SLA trends are visible across releases and load patterns in production can be correlated with CI benchmark history.

Send your comments, suggestions or queries regarding this site to [email protected].