tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Agentic AI > ADK Agent Testing > Property-Based Testing for ADK Agents with Hypothesis

Property-Based Testing for ADK Agents with Hypothesis

Author: Venkata Sudhakar

Traditional unit tests only cover examples you thought of in advance. Property-based testing with Hypothesis generates hundreds of random inputs and verifies that invariants always hold regardless of the specific input. For ShopMax India's ADK agents, this means automatically discovering edge cases: what happens with very long order IDs, unusual city names, or boundary amounts? Properties like the response always containing the order ID must hold for any valid input, not just the ones in your test cases.

Hypothesis works by defining a @given decorator that specifies input strategies. For each test run, Hypothesis generates diverse inputs and shrinks any failing case to the minimal reproducer. For ADK agents, define properties that are always true: response is non-empty, response does not contain error stack traces, response mentions the queried entity, response length is within acceptable bounds. Mock the LLM to return template responses so Hypothesis controls agent behavior deterministically.

The example shows ShopMax India defining two properties for their order tracking agent and running Hypothesis against them. The agent builder is deterministic so Hypothesis can run hundreds of iterations quickly without LLM calls.


It gives the following output,

test_order_response_properties: 200 examples passed
test_order_id_always_in_response: 100 examples passed
Hypothesis found no counterexamples.

Use @settings(max_examples=200) during development and bump to 1000 in nightly CI runs for deeper coverage. When Hypothesis finds a failing case, it prints the minimal shrunk example - this is often a surprising edge case you would never have written manually. Use DirectoryBasedExampleDatabase to persist and replay past failures across CI runs. Avoid testing LLM creativity or response phrasing variation with property-based tests - use it for structural invariants only. Combine property-based and golden-set tests: properties catch unknown unknowns, golden sets catch known regressions.


 
  


  
bl  br