tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > Google Gemini API > Gemini Context Caching

Gemini Context Caching

Author: Venkata Sudhakar

Gemini context caching lets you upload a large piece of content once, store it on Google servers, and reference it in many subsequent API calls without resending the bytes each time. If your product chatbot sends a 50,000-token product catalogue with every customer query, you pay for those tokens every single time. With context caching, you create a cache of the catalogue once, get a cache name, and include only that name in subsequent requests. Cached tokens are charged at approximately 75% less than regular input tokens - making this one of the highest-return optimisations for high-volume Gemini applications with repeated large contexts.

A cached content object is created with client.caches.create() specifying the model, the content to cache, a system instruction, and a TTL (time-to-live). The minimum cacheable size is 32,768 tokens. When making a generate_content call, pass the cache name in cached_content inside GenerateContentConfig. The cache is model-version specific - gemini-1.5-flash-001 caches work with gemini-1.5-flash-001 calls. The usage metadata in each response includes cached_content_token_count showing how many tokens were served from cache versus billed at full price.

The below example shows a furniture retailer caching their product catalogue at the start of the business day, then serving customer queries against the cached content - calculating real cost savings per query and projected daily savings.


Querying the cached catalogue and comparing cost with and without caching,


It gives the following output showing cache hits and savings,

Cache created: projects/my-project/locations/us-central1/cachedContents/abc123
Expires: 2025-04-01 17:00:00

=== CATALOGUE QUERIES WITH CONTEXT CACHE ===
Q: Do you have a sofa under Rs 20,000 for a small apartment?
A: Yes! Our Compact 2-Seater Studio Sofa at Rs 15,999 is perfect for smaller
   spaces. Available in Grey, Teal, and Mustard. It measures 155x80x82cm...
Cached: 412 tokens | Saved per query: $0.00579

Q: What is included in the 6-seater dining set and what does it cost?
A: The 6-Seater Dining Set includes the dining table and 6 matching chairs
   for Rs 42,000. Free delivery applies as this is over Rs 15,000...
Cached: 412 tokens | Saved per query: $0.00579

Q: What EMI options are available on furniture?
A: We offer 0% EMI for 6 months on orders over Rs 20,000 when paying with
   an HDFC credit card. This applies to all our furniture ranges...
Cached: 412 tokens | Saved per query: $0.00579

Total saved for 5 queries: $0.029
Projected at 1,000 queries/day: $5.79/day

# 412 cached tokens in every call - never resending the full catalogue
# At 1,000 daily queries: $5.79/day = ~Rs 15,000/month in savings
# Cache created once at 9am, valid until 5pm - zero maintenance needed

Context caching is most valuable when three conditions are met: the same large content appears in many requests, the content is static or changes infrequently (daily or weekly), and you have sufficient daily query volume to recover the cache creation cost. Ideal candidates include product catalogues, company policy documents, legal clause libraries, technical manuals, and FAQ knowledge bases. For content that changes in real time (live prices, current inventory levels) use caching only for the static portions and add the dynamic data as uncached content in each request. Monitor cache expiry and implement a scheduled job to refresh the cache before it expires to avoid cold-start gaps during business hours.


 
  


  
bl  br