In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Anthropic Claude API > Claude Token Counting and Cost Optimization

Claude Token Counting and Cost Optimization

Author: Venkata Sudhakar

Token counting and cost optimization are critical for running Claude in production at scale. For ShopMax India, a customer support bot handling 50,000 queries per day with an average of 500 input tokens per request can spend thousands of rupees daily on API calls - small inefficiencies multiply rapidly. The Anthropic SDK provides a count_tokens API for pre-flight cost estimation before sending requests, enabling smart routing decisions and prompt optimization before tokens are spent.

Tokens are not the same as words - Claude uses a byte-pair encoding tokenizer where common English words are typically 1 token, while longer words, numbers, and non-English text can be 2-4 tokens each. The count_tokens endpoint accepts the same parameters as messages.create but returns only the input token count without calling the model, costing nothing. Key optimization levers: trim whitespace from prompts (saves 5-15%), use structured data formats over prose (JSON schemas tokenize more efficiently than paragraph descriptions), and route simple queries to Haiku (10x cheaper than Opus) and complex ones to Sonnet or Opus.

The following example shows ShopMax India measuring token counts across prompt variants and routing queries to the most cost-effective Claude model based on complexity:

import anthropic
import os

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

PRICING = {
    "claude-haiku-4-5": {"input": 0.00000025, "output": 0.00000125},
    "claude-sonnet-4-5": {"input": 0.000003, "output": 0.000015},
    "claude-opus-4-5": {"input": 0.000015, "output": 0.000075},
}

def count_tokens(system: str, user_msg: str, model: str) -> int:
    result = client.messages.count_tokens(
        model=model,
        system=system,
        messages=[{"role": "user", "content": user_msg}]
    )
    return result.input_tokens

def estimate_cost_rs(input_tokens: int, output_tokens: int, model: str) -> float:
    p = PRICING[model]
    cost_usd = input_tokens * p["input"] + output_tokens * p["output"]
    return round(cost_usd * 84, 6)

def smart_route(query: str) -> str:
    simple_keywords = ["track", "status", "delivery", "order", "when", "where"]
    is_simple = any(kw in query.lower() for kw in simple_keywords)
    model = "claude-haiku-4-5" if is_simple else "claude-sonnet-4-5"
    system = "You are ShopMax India assistant. Be concise."
    tokens = count_tokens(system, query, model)
    est_output = 80 if is_simple else 200
    cost = estimate_cost_rs(tokens, est_output, model)
    print(f"Query: {query[:50]}")
    print(f"Model selected: {model}")
    print(f"Input tokens: {tokens}, Est. output: {est_output}")
    print(f"Est. cost: Rs {cost}")
    response = client.messages.create(
        model=model, max_tokens=est_output * 2,
        system=system,
        messages=[{"role": "user", "content": query}]
    )
    return response.content[0].text

queries = [
    "Track order ORD-MUM-4421",
    "Compare the Samsung QLED vs LG NanoCell for a home theatre setup in a "
    "medium-sized room in Mumbai. Include sound quality, picture, smart features "
    "and value for money at different price points.",
]
for q in queries:
    print(smart_route(q))
    print()

It gives the following output,

Query: Track order ORD-MUM-4421
Model selected: claude-haiku-4-5
Input tokens: 31, Est. output: 80
Est. cost: Rs 0.000911
I can help track order ORD-MUM-4421. Please provide your registered email or
phone number to pull up the latest status from our system.

Query: Compare the Samsung QLED vs LG NanoCell for a home theatre setup
Model selected: claude-sonnet-4-5
Input tokens: 68, Est. output: 200
Est. cost: Rs 0.025704
For a home theatre setup in Mumbai, both are excellent choices at different
price points. The Samsung 55-inch QLED at Rs 54,990 delivers superior brightness
and contrast with Quantum HDR, ideal for well-lit Mumbai living rooms...

For ShopMax India, implement a three-tier routing strategy: Haiku for factual lookups and order queries (under 100 token responses), Sonnet for product comparisons and recommendation (100-500 token responses), and Opus only for complex escalations requiring deep reasoning like warranty dispute analysis. Log actual token counts from usage on every response - compare them against pre-flight estimates to tune your routing thresholds. A 10% reduction in average input tokens across 50,000 daily queries saves roughly Rs 3,000 per month on Sonnet pricing.

Send your comments, suggestions or queries regarding this site to [email protected].