In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Anthropic Claude API > Claude Streaming Responses in Python

Claude Streaming Responses in Python

Author: Venkata Sudhakar

Streaming responses let Claude send tokens to the client as they are generated rather than waiting for the full response to complete. For ShopMax India, streaming is essential for customer-facing chatbots where users expect to see text appearing immediately - a 3-second blank wait feels broken, while the same 3 seconds with words appearing progressively feels fast and responsive. Streaming is especially valuable for long responses like detailed product comparisons or support troubleshooting guides.

The Anthropic SDK provides streaming via the stream() context manager which yields events as they arrive. The key event types are: content_block_start (a new content block begins), content_block_delta with a text_delta (a chunk of text), content_block_stop (block finished), and message_stop (entire response done). The final_message() method on the stream object returns the complete assembled message with usage stats after streaming completes.

The following example shows ShopMax India using streaming for a product recommendation chatbot. Tokens appear in real time as Claude generates the response:

import anthropic
import os
import time

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

def stream_recommendation(query: str) -> None:
    print("ShopMax India Assistant: ", end="", flush=True)
    start = time.time()
    token_count = 0
    with client.messages.stream(
        model="claude-opus-4-5",
        max_tokens=400,
        system=(
            "You are a helpful ShopMax India product assistant. "
            "Recommend products concisely. Mention prices in Rs. "
            "Keep responses under 100 words."
        ),
        messages=[{"role": "user", "content": query}]
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
            token_count += 1
    final = stream.get_final_message()
    elapsed = time.time() - start
    print()
    print()
    print("--- Stream stats ---")
    print("Input tokens:", final.usage.input_tokens)
    print("Output tokens:", final.usage.output_tokens)
    print("Time to complete:", round(elapsed, 2), "seconds")
    print("Stop reason:", final.stop_reason)

stream_recommendation(
    "I want a good 4K TV under Rs 50000 for my Mumbai apartment living room"
)
print()
stream_recommendation(
    "Best washing machine for a family of 4 in Hyderabad, budget Rs 25000"
)

It gives the following output,

ShopMax India Assistant: For your Mumbai living room under Rs 50,000, I recommend
the Samsung 55-inch 4K Crystal UHD TV at Rs 42,990. It delivers excellent picture
quality with HDR10+ support, built-in smart TV features, and slim bezels perfect
for apartment settings. The LG 50-inch 4K NanoCell at Rs 44,990 is another strong
option with better color accuracy.

--- Stream stats ---
Input tokens: 52
Output tokens: 74
Time to complete: 1.84 seconds
Stop reason: end_turn

ShopMax India Assistant: For a family of 4 in Hyderabad at Rs 25,000, the IFB
6.5kg Front Load Washing Machine at Rs 23,490 is an excellent choice - energy
efficient, gentle on clothes, and ideal for Hyderabad water conditions. The Samsung
6.5kg Top Load at Rs 19,990 is a budget-friendly alternative with good reliability.

--- Stream stats ---
Input tokens: 56
Output tokens: 68
Time to complete: 1.71 seconds
Stop reason: end_turn

For ShopMax India production deployments, route all customer-facing chat interfaces through streaming and reserve non-streaming calls for batch jobs like bulk product description generation. When streaming to a web frontend, use Server-Sent Events (SSE) to forward Claude tokens directly to the browser - this avoids buffering the full response in your backend. Monitor time-to-first-token as a key SLA metric; values above 800ms typically indicate network latency to the Anthropic API rather than model speed. Always call get_final_message() after the stream to capture usage stats for cost tracking, even if you discard the assembled text.

Send your comments, suggestions or queries regarding this site to [email protected].