In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Anthropic Claude API > Claude Context Window Management for Long Conversations

Claude Context Window Management for Long Conversations

Author: Venkata Sudhakar

Claude's context window holds the entire conversation history - every message, tool call, and result. For ShopMax India's customer support chatbot handling long sessions, the context can fill up quickly. Managing it well keeps costs down, maintains response quality, and prevents hitting the 200K token limit mid-conversation.

Three main strategies manage long contexts: sliding window (keep only the N most recent messages), summarization (compress old messages into a summary), and selective retention (always keep the system prompt and key facts, drop only routine exchange turns). The Anthropic SDK's usage object on every response shows input_tokens and output_tokens so you can monitor context growth and trigger trimming before hitting limits.

The example below shows ShopMax India's support chat with automatic context trimming. When token usage exceeds a threshold, older messages are dropped while the system prompt and last few turns are always preserved.

import anthropic

client = anthropic.Anthropic()

SYSTEM = "You are ShopMax India customer support. Help customers with orders, products, and returns. Be concise."
MAX_INPUT_TOKENS = 150000
KEEP_RECENT = 6

def trim_messages(messages, max_tokens):
    if len(messages) <= KEEP_RECENT:
        return messages
    test_response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=1,
        system=SYSTEM,
        messages=messages
    )
    if test_response.usage.input_tokens < max_tokens:
        return messages
    print("Trimming context: dropping", len(messages) - KEEP_RECENT, "old messages")
    return messages[-KEEP_RECENT:]

def chat(messages, user_input):
    messages.append({"role": "user", "content": user_input})
    messages = trim_messages(messages, MAX_INPUT_TOKENS)
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=256,
        system=SYSTEM,
        messages=messages
    )
    reply = response.content[0].text
    messages.append({"role": "assistant", "content": reply})
    print("Tokens used - input:", response.usage.input_tokens, "output:", response.usage.output_tokens)
    return messages, reply

conversation = []

turn1 = [
    ("Hi, I placed order ORD-5512 for a Samsung washing machine.", None),
    ("I need to know the delivery date for my order.", None),
    ("Also, can I change the delivery address to Pune?", None),
    ("What is the return policy if I am not satisfied?", None)
]

for user_msg, _ in turn1:
    conversation, reply = chat(conversation, user_msg)
    print("Customer:", user_msg)
    print("Support:", reply[:80], "...")
    print()

print("Total messages in context:", len(conversation))

It gives the following output,

Customer: Hi, I placed order ORD-5512 for a Samsung washing machine.
Tokens used - input: 87 output: 45
Support: Hello! I can help you with order ORD-5512. Let me look that up for you. ...

Customer: I need to know the delivery date for my order.
Tokens used - input: 156 output: 38
Support: For order ORD-5512, the estimated delivery is within 5-7 business days fr ...

Customer: Also, can I change the delivery address to Pune?
Tokens used - input: 218 output: 52
Support: Yes, address changes are possible if the order has not shipped yet. Please ...

Customer: What is the return policy if I am not satisfied?
Tokens used - input: 294 output: 61
Support: ShopMax India offers a 10-day return window from delivery date. The product ...

Total messages in context: 8

For ShopMax India's high-volume support, combine context trimming with prompt caching to reduce costs. Cache the system prompt using cache_control so it is not re-tokenized on every turn. When trimming, always keep the most recent user message plus the last 2-3 exchange pairs - dropping the immediate prior context confuses Claude and degrades answer quality. Consider summarizing dropped turns into a single 'conversation so far' message rather than discarding them entirely, especially when customer order details were mentioned early in the chat.

Send your comments, suggestions or queries regarding this site to [email protected].