|
|
LLM Cost and Token Monitoring with Langfuse
Author: Venkata Sudhakar
ShopMax India's AI team runs hundreds of LLM calls per hour across the chatbot, recommendation engine, and support summarizer. At Rs 0.15 per 1000 tokens with GPT-4o, costs add up fast - and without visibility, the team cannot tell which feature is driving the bill or where tokens are being wasted. Langfuse is an open-source LLM observability platform that tracks every LLM call with token counts, model costs, latency, and user session context. ShopMax India uses Langfuse to monitor daily spend, find expensive outlier queries, and compare costs across model versions before promoting to production.
Langfuse works through a Python SDK that wraps LLM calls with trace and span objects. Each trace represents a user-facing request; spans represent individual LLM or tool calls within it. The langfuse.trace() context manager attaches metadata like user_id, session_id, and tags. Token counts and costs are captured automatically for OpenAI, Anthropic, and other providers via the observe() decorator. The Langfuse dashboard aggregates this into daily cost charts, p50/p95 latency histograms, and per-model breakdowns filterable by tag or user.
The example below instruments ShopMax India's support summarizer with Langfuse. It processes three customer support tickets, tracks token usage per call, logs the total cost estimate, and demonstrates how to retrieve cost aggregates via the Langfuse client API.
It gives the following output,
Ticket TKT-101 (Mumbai):
Summary: Replace Samsung TV immediately - display defect reported within warranty period by frustrated customer.
Tokens: 81 | Est. cost: $0.00001
Ticket TKT-102 (Bangalore):
Summary: Process refund for OnePlus 11 battery failure under warranty for customer Priya.
Tokens: 74 | Est. cost: $0.00001
Ticket TKT-103 (Delhi):
Summary: Escalate LG washing machine delivery failure - 10 days overdue with no tracking update.
Tokens: 78 | Est. cost: $0.00001
All traces logged to Langfuse dashboard.
In production, use the langfuse.observe() decorator on any Python function to auto-capture inputs, outputs, and token usage without manual trace management. Set up cost alerts in the Langfuse dashboard to notify the ShopMax India engineering Slack channel when daily spend exceeds a threshold. Use the model parameter filtering in Langfuse to compare GPT-4o vs GPT-4o-mini cost and quality side by side over the same query set before deciding which model to promote. For multi-tenant deployments, use the user_id field to attribute costs per customer or per business unit.
|
|