|
|
Token Budget Alerts and Cost Guardrails for LLMs in Python
Author: Venkata Sudhakar
ShopMax India's AI-powered customer support handles thousands of queries per day. Without token budget controls, a single runaway conversation or a misbehaving agent can generate thousands of rupees in unexpected API costs overnight. Setting per-user token limits and cost guardrails ensures that LLM spending stays predictable. This tutorial shows how to track token usage per session, enforce daily budgets, and send alerts when thresholds are crossed.
The approach uses Redis to store cumulative token counts per user. Each LLM call records prompt_tokens and completion_tokens from the API response. A budget check runs before each call - if the user has exceeded their daily limit, the request is blocked and a fallback response is returned. Alerts fire via a print statement or webhook when usage crosses 80% of the daily budget.
The example below implements a token budget manager for ShopMax India. Each customer is capped at 10,000 tokens per day (roughly Rs 0.50 at GPT-4o-mini pricing). We use Redis to persist counts across sessions and trigger an alert when any user approaches their limit.
It gives the following output,
User CUST-MUM-1042: used 147 tokens this call, 147 total today
ShopMax India accepts returns within 7 days of purchase for electronics in original packaging.
User CUST-MUM-1042: used 163 tokens this call, 310 total today
Yes, you can return items purchased at any ShopMax India store to any other store across cities.
In production, replace the print alert with a call to your alerting system - PagerDuty, Slack webhook, or email via SendGrid. Set different budgets by customer tier: free-tier users at 5,000 tokens per day, premium at 50,000. Add a circuit breaker that disables AI for a user who repeatedly hits the cap to prevent abuse. Store budget configurations in a database so they can be adjusted without code changes.
|
|