tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Agentic AI > ADK Agent Testing > Confidence Scoring for ADK Agent Decisions

Confidence Scoring for ADK Agent Decisions

Author: Venkata Sudhakar

Confidence scoring attaches a certainty estimate to each ADK agent decision so that low-confidence responses can be flagged for human review before being shown to customers. ShopMax India uses confidence scores on its return eligibility and refund calculation agents - decisions below 0.80 confidence are routed to a human agent in the Hyderabad support center rather than automated, protecting customers from erroneous outcomes.

A confidence score is computed from signals available at decision time: the number of matching records found, whether required fields were present, and whether the input fell within the training distribution. The score is a float from 0.0 to 1.0 attached to the tool response dict. Tests verify three things: high-confidence cases produce scores above the threshold, low-confidence cases fall below it, and edge cases return a score with the correct escalation flag set.

The example below defines a return eligibility tool that computes a confidence score from order age, payment status, and item condition, then runs three test cases asserting correct confidence bands and escalation routing.


It gives the following output,

Confidence=1.0, eligible=True, escalate=False
Confidence=0.0, reason=order older than 30 days, payment not confirmed, item not in original condition
Borderline: confidence=0.8, reason=item not in original condition
3 passed in 0.04s

In production, log confidence scores alongside every agent decision to build a calibration dataset over time. Use that dataset to adjust thresholds per agent type - a refund agent should have a higher threshold than a product recommendation agent because the cost of a wrong decision is much higher. Expose the confidence score in the API response so downstream systems can apply their own escalation policies.


 
  


  
bl  br