|
|
Gemini Streaming Responses for Chat Apps
Author: Venkata Sudhakar
When a Gemini response takes 3-5 seconds to generate, showing a blank screen until completion makes your application feel broken. Streaming delivers the first words to the user in under a second, with the rest flowing in continuously. For any customer-facing chat application - a support bot, a sales assistant, a product advisor - streaming is the difference between a product that feels alive and one that feels slow. Gemini 2.0 Flash streaming is particularly fast, with first token times often under 300 milliseconds for short prompts. In the google-genai SDK, streaming uses generate_content_stream() instead of generate_content(). It returns an iterator of response chunks, each with a text attribute containing the next piece of generated text. You print or yield each chunk as it arrives. The SDK also provides a resolve() call to wait for the complete response metadata (token counts, finish reason) after the stream ends. For web applications, combine the stream iterator with FastAPI StreamingResponse and Server-Sent Events to push chunks to the browser in real time. The below example builds a streaming customer support agent for an insurance company - showing how chunks arrive progressively, measuring time-to-first-token, and demonstrating the FastAPI SSE pattern for web deployment.
It gives the following output with words appearing progressively as they are generated,
Customer: My car was scratched in a parking lot. How do I raise a claim?
Agent: I am sorry to hear about the scratch on your car. Here is how to
raise a claim with SafeGuard: First, take clear photos of the damage from
multiple angles before moving the vehicle. Then log in to the SafeGuard app
or call 1800-SAFEGUARD within 48 hours of the incident. You will need your
policy number, the date and location of the incident, and the photos. A
survey will be arranged within 24 hours at your preferred location.
[TTFT: 0.28s | Total: 3.1s | 78 words]
Customer: I missed my premium payment by 5 days. Has my policy lapsed?
Agent: Do not worry - SafeGuard provides a 30-day grace period for premium
payments, so a 5-day delay will not lapse your policy. Your coverage remains
active during this grace period. Please make the payment as soon as possible
through the SafeGuard app, net banking, or any UPI app. If you need to set up
auto-pay to avoid this in future, I can guide you through that.
[TTFT: 0.31s | Total: 2.8s | 72 words]
# First words appear in 0.28-0.31 seconds - customer sees immediate response
# Without streaming: customer would wait the full 3 seconds seeing nothing
The FastAPI SSE endpoint streams each Gemini token directly to the browser,
GET /support/chat?question=How+do+I+raise+a+claim
data: I
data: am
data: sorry
data: to
data: hear
...(tokens stream continuously)...
data: [DONE]
# Each SSE event fires onmessage in the browser
# Text accumulates word by word - smooth typewriter effect
# No polling, no long-held connections beyond the stream duration
Streaming best practices: always flush stdout when printing chunks (flush=True) otherwise output buffers and defeats the purpose. For FastAPI, use async generators and StreamingResponse with media_type="text/event-stream" for true SSE. Add a heartbeat every 15 seconds for long-running streams to prevent proxy timeout disconnections. Handle stream interruptions gracefully on the client side - if the user navigates away, close the EventSource to free server resources. For mobile apps, use chunked HTTP transfer encoding rather than SSE - the streaming pattern is the same but the transport differs.
|
|