tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Agentic AI > ADK Agent Testing > Output Diversity and Variance Testing for ADK Agents

Output Diversity and Variance Testing for ADK Agents

Author: Venkata Sudhakar

Output diversity testing verifies that an ADK agent produces a sufficiently varied set of responses when the same intent is expressed in different ways, ensuring customers in Mumbai and Hyderabad do not receive robotic, repetitive answers that damage engagement. Conversely, variance testing ensures the agent does not produce wildly inconsistent responses to semantically identical inputs, which would indicate an unstable model or prompt.

Diversity is measured by computing the unique response ratio across N paraphrased inputs and asserting it exceeds a minimum diversity floor. Variance is measured by computing pairwise similarity across responses to semantically identical inputs and asserting the coefficient of variation stays below a ceiling. Together, the two tests define a quality band: diverse enough to feel natural, consistent enough to be reliable. The statistics module handles variance calculation without additional dependencies.

The example below generates responses to five paraphrases of the same product query, measures uniqueness ratio for diversity, then generates five responses to identical inputs and measures similarity variance to catch instability.


It gives the following output,

Diversity ratio: 1.00 (5/5 unique)
Length CV: 0.0752 (mean=84.4, stdev=6.3)
Mean pairwise similarity: 0.6821
3 passed in 0.06s

Calibrate MIN_DIVERSITY_RATIO and MAX_VARIANCE_CV using a sample of real production responses so the thresholds reflect what good looks like for your specific agent. Run diversity tests after every prompt change since rewritten prompts sometimes inadvertently anchor the model to a single phrasing pattern. For multi-language agents serving Hindi and English customers, run diversity tests per language separately because diversity bands can differ significantly across languages.


 
  


  
bl  br