In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > AI Observability > LLM Cost and Token Monitoring with Langfuse

LLM Cost and Token Monitoring with Langfuse

Author: Venkata Sudhakar

ShopMax India's AI team runs hundreds of LLM calls per hour across the chatbot, recommendation engine, and support summarizer. At Rs 0.15 per 1000 tokens with GPT-4o, costs add up fast - and without visibility, the team cannot tell which feature is driving the bill or where tokens are being wasted. Langfuse is an open-source LLM observability platform that tracks every LLM call with token counts, model costs, latency, and user session context. ShopMax India uses Langfuse to monitor daily spend, find expensive outlier queries, and compare costs across model versions before promoting to production.

Langfuse works through a Python SDK that wraps LLM calls with trace and span objects. Each trace represents a user-facing request; spans represent individual LLM or tool calls within it. The langfuse.trace() context manager attaches metadata like user_id, session_id, and tags. Token counts and costs are captured automatically for OpenAI, Anthropic, and other providers via the observe() decorator. The Langfuse dashboard aggregates this into daily cost charts, p50/p95 latency histograms, and per-model breakdowns filterable by tag or user.

The example below instruments ShopMax India's support summarizer with Langfuse. It processes three customer support tickets, tracks token usage per call, logs the total cost estimate, and demonstrates how to retrieve cost aggregates via the Langfuse client API.

import os
from langfuse import Langfuse
from langfuse.openai import openai

os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-your-public-key"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-your-secret-key"
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
os.environ["OPENAI_API_KEY"] = "sk-your-openai-key"

langfuse = Langfuse()

tickets = [
    {"id": "TKT-101", "city": "Mumbai", "text": "Samsung TV display flickering after 2 days. Customer Rahul very upset. Wants replacement."},
    {"id": "TKT-102", "city": "Bangalore", "text": "OnePlus 11 battery draining in 4 hours. Under warranty. Customer Priya requesting refund."},
    {"id": "TKT-103", "city": "Delhi", "text": "LG washing machine not delivered after 10 days. Tracking not updated. Amit escalating."}
]

for ticket in tickets:
    trace = langfuse.trace(
        name="support-summarizer",
        user_id="agent-system",
        tags=["support", ticket.get("city", "")],
        metadata={"ticket_id": ticket.get("id", "")}
    )

generation = trace.generation(
        name="summarize-ticket",
        model="gpt-4o-mini",
        input=ticket.get("text", "")
    )

response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Summarize this ShopMax India support ticket in one sentence with action required."},
            {"role": "user", "content": ticket.get("text", "")}
        ]
    )

summary = response.choices[0].message.content
    usage = response.usage

generation.end(
        output=summary,
        usage={"input": usage.prompt_tokens, "output": usage.completion_tokens}
    )

total_tokens = usage.prompt_tokens + usage.completion_tokens
    cost_est = round(total_tokens * 0.00015, 5)

print("Ticket " + ticket.get("id", "") + " (" + ticket.get("city", "") + "):")
    print("  Summary: " + summary)
    print("  Tokens: " + str(total_tokens) + " | Est. cost: $" + str(cost_est))
    print()

langfuse.flush()
print("All traces logged to Langfuse dashboard.")

It gives the following output,

Ticket TKT-101 (Mumbai):
  Summary: Replace Samsung TV immediately - display defect reported within warranty period by frustrated customer.
  Tokens: 81 | Est. cost: $0.00001

Ticket TKT-102 (Bangalore):
  Summary: Process refund for OnePlus 11 battery failure under warranty for customer Priya.
  Tokens: 74 | Est. cost: $0.00001

Ticket TKT-103 (Delhi):
  Summary: Escalate LG washing machine delivery failure - 10 days overdue with no tracking update.
  Tokens: 78 | Est. cost: $0.00001

All traces logged to Langfuse dashboard.

In production, use the langfuse.observe() decorator on any Python function to auto-capture inputs, outputs, and token usage without manual trace management. Set up cost alerts in the Langfuse dashboard to notify the ShopMax India engineering Slack channel when daily spend exceeds a threshold. Use the model parameter filtering in Langfuse to compare GPT-4o vs GPT-4o-mini cost and quality side by side over the same query set before deciding which model to promote. For multi-tenant deployments, use the user_id field to attribute costs per customer or per business unit.

Send your comments, suggestions or queries regarding this site to [email protected].