tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > AI Observability > Real-Time LLM Quality Scoring with Custom Metrics in Python

Real-Time LLM Quality Scoring with Custom Metrics in Python

Author: Venkata Sudhakar

ShopMax India's order tracking agent sometimes produces vague or off-topic responses that frustrate customers before a human supervisor can intervene. Rather than relying on post-hoc evaluation, a real-time quality scorer can assess each response before it is delivered, flagging low-confidence answers for human review or triggering an automatic retry. This tutorial shows how to build a lightweight quality scoring pipeline using rule-based checks and an LLM-as-judge pattern.

The scoring pipeline runs three checks on every response: a keyword relevance check (does the response mention key entities from the question?), a length check (is the response too short to be useful or too long to be readable?), and an LLM-as-judge score (a second LLM call that rates the response on a 1-5 scale). A composite score decides whether to deliver, retry, or escalate to a human agent. The overhead is one small LLM call per response, adding roughly 100-150ms but preventing bad responses from reaching customers.

The example below shows the quality scorer for ShopMax India order queries. Responses scoring below 0.6 trigger a retry or escalation to a human support agent.


It gives the following output,

Keyword score:  0.25
Length score:   0.20
LLM judge:      0.40
Composite:      0.35
Action: RETRY or ESCALATE

Tune the weights based on your business priorities - for ShopMax India order queries, LLM judge accuracy matters most, so give it 60% weight. Cache judge scores for identical responses to avoid redundant API calls. Log all scores to a time-series database like InfluxDB or BigQuery so you can track quality trends over time and alert when the 7-day rolling average composite score drops below 0.6.


 
  


  
bl  br