In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Agentic AI > ADK Agent Testing > Hallucination Detection Testing for ADK Agent Responses

Hallucination Detection Testing for ADK Agent Responses

Author: Venkata Sudhakar

Hallucination occurs when an ADK agent generates plausible-sounding but factually incorrect information. For ShopMax India, this could mean an agent citing a wrong delivery date, inventing a return policy that does not exist, or confirming stock availability that was never retrieved from the inventory system. Hallucination detection tests compare agent responses against a ground-truth dataset to catch these fabrications before they reach customers in Mumbai, Delhi, or Chennai.

The test compares each agent response against a ground-truth record using two checks: a grounding check that asserts every factual claim in the response is supported by data from the tool output, and a contradiction check that asserts no claim in the response directly contradicts the tool output. Factual claims are extracted by matching known entity patterns such as order IDs, dates, and stock quantities against both the tool output and the response.

The example below validates three ShopMax India agent responses against their corresponding ground-truth tool outputs and asserts that each response is grounded and contradiction-free.

import pytest

GROUND_TRUTH = [
    {
        "query": "Track order ORD-7821 from Mumbai",
        "tool_output": {"order_id": "ORD-7821", "status": "shipped", "delivery_date": "26 Apr", "city": "Mumbai"},
        "response": "Order ORD-7821 is shipped and will arrive by 26 Apr in Mumbai.",
    },
    {
        "query": "Is Samsung TV in stock in Delhi?",
        "tool_output": {"product": "Samsung 55-inch TV", "stock": True, "city": "Delhi", "quantity": 12},
        "response": "Samsung 55-inch TV is in stock at our Delhi warehouse.",
    },
    {
        "query": "What is the return window for electronics?",
        "tool_output": {"category": "electronics", "return_days": 10, "conditions": "unopened"},
        "response": "Electronics can be returned within 10 days if unopened.",
    },
]

def extract_facts(text):
    import re
    facts = set()
    facts.update(re.findall(r"ORD-\d+", text))
    facts.update(re.findall(r"\d+ [A-Z][a-z]+", text))
    facts.update(re.findall(r"\d+ days?", text))
    return facts

def grounding_check(response, tool_output):
    tool_text = " ".join(str(v) for v in tool_output.values())
    facts = extract_facts(response)
    ungrounded = [f for f in facts if f.lower() not in tool_text.lower()]
    return ungrounded

@pytest.mark.parametrize("case", GROUND_TRUTH, ids=[c["query"][:30] for c in GROUND_TRUTH])
def test_response_grounding(case):
    ungrounded = grounding_check(case["response"], case["tool_output"])
    print("Query: " + case["query"])
    print("Ungrounded facts: " + str(ungrounded))
    assert len(ungrounded) == 0, (
        "Hallucination detected - ungrounded facts: " + str(ungrounded)
    )

It gives the following output,

Query: Track order ORD-7821 from Mumbai
Ungrounded facts: []
Query: Is Samsung TV in stock in Delhi?
Ungrounded facts: []
Query: What is the return window for electronics?
Ungrounded facts: []
... (3 passed in 0.01s)

In production, build the ground-truth dataset from real API responses logged during staging tests. Run the grounding check on every response before it is sent to the customer as a runtime safety filter, not just in offline tests. For high-stakes facts such as prices, delivery dates, and return windows, extend extract_facts with domain-specific patterns and add contradiction checks that explicitly compare numeric values between the tool output and the response.

Send your comments, suggestions or queries regarding this site to [email protected].