In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Agentic AI > ADK Agent Testing > Property-Based Testing for ADK Agents with Hypothesis

Property-Based Testing for ADK Agents with Hypothesis

Author: Venkata Sudhakar

Traditional unit tests only cover examples you thought of in advance. Property-based testing with Hypothesis generates hundreds of random inputs and verifies that invariants always hold regardless of the specific input. For ShopMax India's ADK agents, this means automatically discovering edge cases: what happens with very long order IDs, unusual city names, or boundary amounts? Properties like the response always containing the order ID must hold for any valid input, not just the ones in your test cases.

Hypothesis works by defining a @given decorator that specifies input strategies. For each test run, Hypothesis generates diverse inputs and shrinks any failing case to the minimal reproducer. For ADK agents, define properties that are always true: response is non-empty, response does not contain error stack traces, response mentions the queried entity, response length is within acceptable bounds. Mock the LLM to return template responses so Hypothesis controls agent behavior deterministically.

The example shows ShopMax India defining two properties for their order tracking agent and running Hypothesis against them. The agent builder is deterministic so Hypothesis can run hundreds of iterations quickly without LLM calls.

import pytest
from hypothesis import given, settings
from hypothesis import strategies as st

ORDER_ID_STRATEGY = st.from_regex(r"ORD-[0-9]{4,6}", fullmatch=True)
CITY_STRATEGY = st.sampled_from(["Mumbai", "Bangalore", "Delhi", "Hyderabad", "Chennai"])
AMOUNT_STRATEGY = st.integers(min_value=100, max_value=100000)

def build_order_response(order_id, city, amount):
    return (
        "Your order " + order_id + " has been dispatched from our " + city +
        " warehouse. The total amount of Rs " + str(amount) + " has been charged."
    )

def validate_order_response(response, order_id):
    assert len(response) > 10, "Response too short"
    assert len(response) < 2000, "Response too long"
    assert order_id in response, "Order ID missing from response"
    assert "error" not in response.lower(), "Response contains error text"
    assert "traceback" not in response.lower(), "Response contains stack trace"

@given(
    order_id=ORDER_ID_STRATEGY,
    city=CITY_STRATEGY,
    amount=AMOUNT_STRATEGY
)
@settings(max_examples=200)
def test_order_response_properties(order_id, city, amount):
    response = build_order_response(order_id, city, amount)
    validate_order_response(response, order_id)

@given(order_id=ORDER_ID_STRATEGY)
@settings(max_examples=100)
def test_order_id_always_in_response(order_id):
    response = build_order_response(order_id, "Mumbai", 4999)
    assert order_id in response

It gives the following output,

test_order_response_properties: 200 examples passed
test_order_id_always_in_response: 100 examples passed
Hypothesis found no counterexamples.

Use @settings(max_examples=200) during development and bump to 1000 in nightly CI runs for deeper coverage. When Hypothesis finds a failing case, it prints the minimal shrunk example - this is often a surprising edge case you would never have written manually. Use DirectoryBasedExampleDatabase to persist and replay past failures across CI runs. Avoid testing LLM creativity or response phrasing variation with property-based tests - use it for structural invariants only. Combine property-based and golden-set tests: properties catch unknown unknowns, golden sets catch known regressions.

Send your comments, suggestions or queries regarding this site to [email protected].