|
|
Confidence Scoring for ADK Agent Decisions
Author: Venkata Sudhakar
Confidence scoring attaches a certainty estimate to each ADK agent decision so that low-confidence responses can be flagged for human review before being shown to customers. ShopMax India uses confidence scores on its return eligibility and refund calculation agents - decisions below 0.80 confidence are routed to a human agent in the Hyderabad support center rather than automated, protecting customers from erroneous outcomes.
A confidence score is computed from signals available at decision time: the number of matching records found, whether required fields were present, and whether the input fell within the training distribution. The score is a float from 0.0 to 1.0 attached to the tool response dict. Tests verify three things: high-confidence cases produce scores above the threshold, low-confidence cases fall below it, and edge cases return a score with the correct escalation flag set.
The example below defines a return eligibility tool that computes a confidence score from order age, payment status, and item condition, then runs three test cases asserting correct confidence bands and escalation routing.
It gives the following output,
Confidence=1.0, eligible=True, escalate=False
Confidence=0.0, reason=order older than 30 days, payment not confirmed, item not in original condition
Borderline: confidence=0.8, reason=item not in original condition
3 passed in 0.04s
In production, log confidence scores alongside every agent decision to build a calibration dataset over time. Use that dataset to adjust thresholds per agent type - a refund agent should have a higher threshold than a product recommendation agent because the cost of a wrong decision is much higher. Expose the confidence score in the API response so downstream systems can apply their own escalation policies.
|
|