|
|
Hallucination Detection Testing for ADK Agent Responses
Author: Venkata Sudhakar
Hallucination occurs when an ADK agent generates plausible-sounding but factually incorrect information. For ShopMax India, this could mean an agent citing a wrong delivery date, inventing a return policy that does not exist, or confirming stock availability that was never retrieved from the inventory system. Hallucination detection tests compare agent responses against a ground-truth dataset to catch these fabrications before they reach customers in Mumbai, Delhi, or Chennai.
The test compares each agent response against a ground-truth record using two checks: a grounding check that asserts every factual claim in the response is supported by data from the tool output, and a contradiction check that asserts no claim in the response directly contradicts the tool output. Factual claims are extracted by matching known entity patterns such as order IDs, dates, and stock quantities against both the tool output and the response.
The example below validates three ShopMax India agent responses against their corresponding ground-truth tool outputs and asserts that each response is grounded and contradiction-free.
It gives the following output,
Query: Track order ORD-7821 from Mumbai
Ungrounded facts: []
Query: Is Samsung TV in stock in Delhi?
Ungrounded facts: []
Query: What is the return window for electronics?
Ungrounded facts: []
... (3 passed in 0.01s)
In production, build the ground-truth dataset from real API responses logged during staging tests. Run the grounding check on every response before it is sent to the customer as a runtime safety filter, not just in offline tests. For high-stakes facts such as prices, delivery dates, and return windows, extend extract_facts with domain-specific patterns and add contradiction checks that explicitly compare numeric values between the tool output and the response.
|
|