|
|
Claude Streaming Responses in Python
Author: Venkata Sudhakar
Streaming responses let Claude send tokens to the client as they are generated rather than waiting for the full response to complete. For ShopMax India, streaming is essential for customer-facing chatbots where users expect to see text appearing immediately - a 3-second blank wait feels broken, while the same 3 seconds with words appearing progressively feels fast and responsive. Streaming is especially valuable for long responses like detailed product comparisons or support troubleshooting guides.
The Anthropic SDK provides streaming via the stream() context manager which yields events as they arrive. The key event types are: content_block_start (a new content block begins), content_block_delta with a text_delta (a chunk of text), content_block_stop (block finished), and message_stop (entire response done). The final_message() method on the stream object returns the complete assembled message with usage stats after streaming completes.
The following example shows ShopMax India using streaming for a product recommendation chatbot. Tokens appear in real time as Claude generates the response:
It gives the following output,
ShopMax India Assistant: For your Mumbai living room under Rs 50,000, I recommend
the Samsung 55-inch 4K Crystal UHD TV at Rs 42,990. It delivers excellent picture
quality with HDR10+ support, built-in smart TV features, and slim bezels perfect
for apartment settings. The LG 50-inch 4K NanoCell at Rs 44,990 is another strong
option with better color accuracy.
--- Stream stats ---
Input tokens: 52
Output tokens: 74
Time to complete: 1.84 seconds
Stop reason: end_turn
ShopMax India Assistant: For a family of 4 in Hyderabad at Rs 25,000, the IFB
6.5kg Front Load Washing Machine at Rs 23,490 is an excellent choice - energy
efficient, gentle on clothes, and ideal for Hyderabad water conditions. The Samsung
6.5kg Top Load at Rs 19,990 is a budget-friendly alternative with good reliability.
--- Stream stats ---
Input tokens: 56
Output tokens: 68
Time to complete: 1.71 seconds
Stop reason: end_turn
For ShopMax India production deployments, route all customer-facing chat interfaces through streaming and reserve non-streaming calls for batch jobs like bulk product description generation. When streaming to a web frontend, use Server-Sent Events (SSE) to forward Claude tokens directly to the browser - this avoids buffering the full response in your backend. Monitor time-to-first-token as a key SLA metric; values above 800ms typically indicate network latency to the Anthropic API rather than model speed. Always call get_final_message() after the stream to capture usage stats for cost tracking, even if you discard the assembled text.
|
|