In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Anthropic Claude API > Claude Model Selection - Haiku vs Sonnet vs Opus

Claude Model Selection - Haiku vs Sonnet vs Opus

Author: Venkata Sudhakar

Choosing the right Claude model is one of the most impactful decisions in building a production AI system. Anthropic offers three tiers: Haiku (fastest and cheapest), Sonnet (balanced performance and cost), and Opus (most capable). For ShopMax India, using Opus for every query is like shipping all parcels by air courier - sometimes necessary, but mostly wasteful. A smart routing strategy matches each task to the cheapest model that can handle it well, cutting API costs by 60-80% without sacrificing quality.

Haiku excels at classification, intent detection, simple Q and A, and short text generation where speed matters more than nuance. Sonnet handles product comparisons, multi-step reasoning, summarization, and most customer support scenarios. Opus is reserved for complex analysis requiring deep reasoning - legal document review, multi-document synthesis, or tasks where response quality directly affects revenue. Benchmark your specific tasks across models before committing; the quality gap between Sonnet and Opus is smaller than the 5x price difference for most retail use cases.

The following example benchmarks three ShopMax India tasks across Haiku, Sonnet, and Opus to measure quality and cost tradeoffs:

import anthropic
import os
import time

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

MODELS = ["claude-haiku-4-5", "claude-sonnet-4-5", "claude-opus-4-5"]
COST_PER_1K = {
    "claude-haiku-4-5":  {"in": 0.00025, "out": 0.00125},
    "claude-sonnet-4-5": {"in": 0.003,   "out": 0.015},
    "claude-opus-4-5":   {"in": 0.015,   "out": 0.075},
}

def run_task(task_name: str, prompt: str, max_tokens: int = 150) -> None:
    print("Task:", task_name)
    print("-" * 50)
    for model in MODELS:
        start = time.time()
        resp = client.messages.create(
            model=model, max_tokens=max_tokens,
            messages=[{"role": "user", "content": prompt}]
        )
        elapsed = time.time() - start
        text = resp.content[0].text
        in_tok = resp.usage.input_tokens
        out_tok = resp.usage.output_tokens
        cost_rs = ((in_tok * COST_PER_1K[model]["in"] / 1000) +
                   (out_tok * COST_PER_1K[model]["out"] / 1000)) * 84
        short = model.split("-")[1].capitalize()
        print(f"{short}: [{elapsed:.1f}s, Rs {cost_rs:.5f}]")
        print(text[:120])
        print()

run_task(
    "Intent Classification",
    "Classify this customer message into one category (ORDER_TRACK/RETURN/PRODUCT_QUERY/OTHER): "
    "My Samsung TV arrived with a cracked screen, I want to send it back. Order ORD-MUM-9921"
)
run_task(
    "Product Comparison",
    "Compare Samsung 4K QLED vs LG NanoCell for a budget-conscious family in Delhi. "
    "2-3 sentences max. Prices in Rs.",
    max_tokens=120
)

It gives the following output,

Task: Intent Classification
--------------------------------------------------
Haiku: [0.4s, Rs 0.00021]
RETURN

Sonnet: [0.7s, Rs 0.00189]
RETURN

Opus: [1.2s, Rs 0.00945]
RETURN

Task: Product Comparison
--------------------------------------------------
Haiku: [0.6s, Rs 0.00043]
For a budget-conscious Delhi family, the LG NanoCell at Rs 44,990 offers better
color accuracy and viewing angles. The Samsung QLED at Rs 54,990 is brighter.

Sonnet: [0.9s, Rs 0.00378]
The LG NanoCell at Rs 44,990 is the smarter choice for Delhi families - better
viewing angles suit large family rooms and the price leaves budget for accessories.

Opus: [1.8s, Rs 0.01890]
For a budget-conscious Delhi family, the LG 50-inch NanoCell at Rs 44,990 offers
excellent value with accurate colors, wide viewing angles for group viewing.

For ShopMax India, apply this routing logic: use Haiku for all classification, tagging, and intent detection tasks (typically 50-100 token outputs); use Sonnet as the default for customer-facing chat and product queries (100-400 token outputs); reserve Opus for warranty dispute resolution, complex multi-product comparisons, and any task where a wrong answer has financial consequences. Measure quality with a small human-labeled evaluation set for each task type - if Haiku scores within 5% of Opus on your specific task, use Haiku and save the difference.

Send your comments, suggestions or queries regarding this site to [email protected].