|
|
Tool Selection Quality Testing for ADK Agents
Author: Venkata Sudhakar
Tool selection quality testing verifies that an ADK agent calls the correct tool for a given user intent, not just that the tool returns the right result. ShopMax India's customer service agent has tools for order status, product search, and return initiation - and a wrong tool selection (e.g. initiating a return when the customer asked about delivery) causes real harm and must be caught in testing before it reaches customers in Delhi and Hyderabad.
The testing pattern instruments the agent's tool dispatch layer to record which tool was called for each input, then compares the recorded tool name against an expected tool name defined in a golden dataset. A ToolCallRecorder wraps each tool function and appends the call to a shared list. After the agent run, the test asserts that the recorded tool matches the expected one. This separates tool selection correctness from tool output correctness, making failures easier to diagnose.
The example below defines a ToolCallRecorder, wraps three ShopMax India tools, runs four test cases from a golden dataset, and asserts the correct tool was selected for each user intent.
It gives the following output,
Intent=order_status -> tool=get_order_status OK
Intent=search -> tool=search_products OK
Intent=return -> tool=initiate_return OK
Intent=order_status -> tool=get_order_status OK
4 passed in 0.07s
Extend the golden dataset to cover edge cases like ambiguous intents where multiple tools could match. Track tool selection accuracy as a percentage metric in CI reports - a drop from 100% to 95% across a release is a signal worth investigating. For multi-turn conversations, record the full tool call sequence and assert the sequence matches the expected flow, not just the final call.
|
|