In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > AI Observability > Token Budget Alerts and Cost Guardrails for LLMs in Python

Token Budget Alerts and Cost Guardrails for LLMs in Python

Author: Venkata Sudhakar

ShopMax India's AI-powered customer support handles thousands of queries per day. Without token budget controls, a single runaway conversation or a misbehaving agent can generate thousands of rupees in unexpected API costs overnight. Setting per-user token limits and cost guardrails ensures that LLM spending stays predictable. This tutorial shows how to track token usage per session, enforce daily budgets, and send alerts when thresholds are crossed.

The approach uses Redis to store cumulative token counts per user. Each LLM call records prompt_tokens and completion_tokens from the API response. A budget check runs before each call - if the user has exceeded their daily limit, the request is blocked and a fallback response is returned. Alerts fire via a print statement or webhook when usage crosses 80% of the daily budget.

The example below implements a token budget manager for ShopMax India. Each customer is capped at 10,000 tokens per day (roughly Rs 0.50 at GPT-4o-mini pricing). We use Redis to persist counts across sessions and trigger an alert when any user approaches their limit.

import os
import redis
import time
from openai import OpenAI

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
r = redis.Redis(host="localhost", port=6379, decode_responses=True)

DAILY_TOKEN_BUDGET = 10000
ALERT_THRESHOLD = 0.8

def get_token_key(user_id: str) -> str:
    today = time.strftime("%Y-%m-%d")
    return f"tokens:{user_id}:{today}"

def get_usage(user_id: str) -> int:
    key = get_token_key(user_id)
    val = r.get(key)
    return int(val) if val else 0

def record_usage(user_id: str, tokens: int):
    key = get_token_key(user_id)
    new_total = r.incrby(key, tokens)
    r.expire(key, 86400)
    budget_pct = new_total / DAILY_TOKEN_BUDGET
    if budget_pct >= ALERT_THRESHOLD:
        print(f"ALERT: User {user_id} at {budget_pct*100:.1f}% of daily budget ({new_total}/{DAILY_TOKEN_BUDGET} tokens)")
    return new_total

def ask_shopmax_ai(user_id: str, question: str) -> str:
    usage = get_usage(user_id)
    if usage >= DAILY_TOKEN_BUDGET:
        return "Daily query limit reached. Please try again tomorrow or contact support."
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are ShopMax India customer support."},
            {"role": "user", "content": question}
        ]
    )
    tokens_used = response.usage.total_tokens
    new_total = record_usage(user_id, tokens_used)
    print(f"User {user_id}: used {tokens_used} tokens this call, {new_total} total today")
    return response.choices[0].message.content

print(ask_shopmax_ai("CUST-MUM-1042", "What is your return policy for electronics?"))
print(ask_shopmax_ai("CUST-MUM-1042", "Can I return a TV purchased in Delhi to a Mumbai store?"))

It gives the following output,

User CUST-MUM-1042: used 147 tokens this call, 147 total today
ShopMax India accepts returns within 7 days of purchase for electronics in original packaging.
User CUST-MUM-1042: used 163 tokens this call, 310 total today
Yes, you can return items purchased at any ShopMax India store to any other store across cities.

In production, replace the print alert with a call to your alerting system - PagerDuty, Slack webhook, or email via SendGrid. Set different budgets by customer tier: free-tier users at 5,000 tokens per day, premium at 50,000. Add a circuit breaker that disables AI for a user who repeatedly hits the cap to prevent abuse. Store budget configurations in a database so they can be adjusted without code changes.

Send your comments, suggestions or queries regarding this site to [email protected].