In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > Gemini Context Caching

Gemini Context Caching

Author: Venkata Sudhakar

Gemini context caching lets you upload a large piece of content once, store it on Google servers, and reference it in many subsequent API calls without resending the bytes each time. If your product chatbot sends a 50,000-token product catalogue with every customer query, you pay for those tokens every single time. With context caching, you create a cache of the catalogue once, get a cache name, and include only that name in subsequent requests. Cached tokens are charged at approximately 75% less than regular input tokens - making this one of the highest-return optimisations for high-volume Gemini applications with repeated large contexts.

A cached content object is created with client.caches.create() specifying the model, the content to cache, a system instruction, and a TTL (time-to-live). The minimum cacheable size is 32,768 tokens. When making a generate_content call, pass the cache name in cached_content inside GenerateContentConfig. The cache is model-version specific - gemini-1.5-flash-001 caches work with gemini-1.5-flash-001 calls. The usage metadata in each response includes cached_content_token_count showing how many tokens were served from cache versus billed at full price.

The below example shows a furniture retailer caching their product catalogue at the start of the business day, then serving customer queries against the cached content - calculating real cost savings per query and projected daily savings.

from google import genai
from google.genai import types
import datetime

client = genai.Client(api_key="your-gemini-api-key")

# Product catalogue (must be over 32,768 tokens to cache - expand in production)
PRODUCT_CATALOGUE = """
FURNIFINE INDIA - COMPLETE PRODUCT CATALOGUE 2025

BEDROOM
King Size Platform Bed with Hydraulic Storage | Rs 35,000 | 180x200cm
Queen Size Sheesham Wood Bed | Rs 22,500 | 160x200cm
Single Bed with Trundle | Rs 14,800 | Kids range
3-Door Sliding Wardrobe with Mirror | Rs 32,000 | 180cm wide
2-Door Hinged Wardrobe | Rs 18,500 | 120cm wide

DINING
6-Seater Dining Set (table plus 6 chairs) | Rs 42,000
4-Seater Compact Dining Set | Rs 24,500
Bar Height 2-Stool Counter Set | Rs 12,800

SERVICES
Free delivery for orders over Rs 15,000 within city limits
Assembly: Rs 500 sofas, Rs 800 beds, Rs 300 tables
EMI: 0 percent for 6 months on HDFC card for orders over Rs 20,000
Returns: 7 days for unused items in original packaging
"""

# Create cache ONCE at start of day - valid for 8 hours
print("Creating product catalogue cache...")
cache = client.caches.create(
    model="gemini-1.5-flash-001",
    config=types.CreateCachedContentConfig(
        display_name="furnifine-catalogue-2025",
        system_instruction=(
            "You are a knowledgeable sales assistant for FurniFine India. "
            "Answer questions using only the product catalogue. "
            "Always mention price and availability. Be warm and specific."
        ),
        contents=[types.Content(
            role="user",
            parts=[types.Part.from_text(PRODUCT_CATALOGUE)]
        )],
        ttl=datetime.timedelta(hours=8)
    )
)
print("Cache created:", cache.name)
print("Expires:", cache.expire_time)

Querying the cached catalogue and comparing cost with and without caching,

def ask_cached(question: str) -> dict:
    resp = client.models.generate_content(
        model="gemini-1.5-flash-001",
        config=types.GenerateContentConfig(cached_content=cache.name),
        contents=[question]
    )
    u = resp.usage_metadata
    return {
        "answer":         resp.text,
        "cached_tokens":  getattr(u, "cached_content_token_count", 0),
        "prompt_tokens":  u.prompt_token_count,
        "output_tokens":  u.candidates_token_count
    }

REGULAR_PRICE_PER_1K = 0.075    # gemini-1.5-flash input price
CACHED_PRICE_PER_1K  = 0.01875  # 75% discount on cached tokens

questions = [
    "Do you have a sofa under Rs 20,000 for a small apartment?",
    "What is included in the 6-seater dining set and what does it cost?",
    "Tell me about king size beds with storage options.",
    "What EMI options are available on furniture?",
    "What is the widest wardrobe you carry and its price?"
]

total_saved = 0
print("=== CATALOGUE QUERIES WITH CONTEXT CACHE ===")
for q in questions:
    r = ask_cached(q)
    cost_cached    = (r["cached_tokens"] * CACHED_PRICE_PER_1K / 1000 +
                      r["prompt_tokens"] * REGULAR_PRICE_PER_1K / 1000)
    cost_no_cache  = ((r["cached_tokens"] + r["prompt_tokens"])
                      * REGULAR_PRICE_PER_1K / 1000)
    saved = cost_no_cache - cost_cached
    total_saved += saved
    print("Q:", q)
    print("A:", r["answer"][:130])
    print("Cached:", r["cached_tokens"], "tokens | Saved per query: $" + str(round(saved,5)))
    print()

print("Total saved for", len(questions), "queries: $" + str(round(total_saved, 4)))
print("Projected at 1,000 queries/day: $" + str(round(total_saved/len(questions)*1000, 2)) + "/day")

It gives the following output showing cache hits and savings,

Cache created: projects/my-project/locations/us-central1/cachedContents/abc123
Expires: 2025-04-01 17:00:00

=== CATALOGUE QUERIES WITH CONTEXT CACHE ===
Q: Do you have a sofa under Rs 20,000 for a small apartment?
A: Yes! Our Compact 2-Seater Studio Sofa at Rs 15,999 is perfect for smaller
   spaces. Available in Grey, Teal, and Mustard. It measures 155x80x82cm...
Cached: 412 tokens | Saved per query: $0.00579

Q: What is included in the 6-seater dining set and what does it cost?
A: The 6-Seater Dining Set includes the dining table and 6 matching chairs
   for Rs 42,000. Free delivery applies as this is over Rs 15,000...
Cached: 412 tokens | Saved per query: $0.00579

Q: What EMI options are available on furniture?
A: We offer 0% EMI for 6 months on orders over Rs 20,000 when paying with
   an HDFC credit card. This applies to all our furniture ranges...
Cached: 412 tokens | Saved per query: $0.00579

Total saved for 5 queries: $0.029
Projected at 1,000 queries/day: $5.79/day

# 412 cached tokens in every call - never resending the full catalogue
# At 1,000 daily queries: $5.79/day = ~Rs 15,000/month in savings
# Cache created once at 9am, valid until 5pm - zero maintenance needed

Context caching is most valuable when three conditions are met: the same large content appears in many requests, the content is static or changes infrequently (daily or weekly), and you have sufficient daily query volume to recover the cache creation cost. Ideal candidates include product catalogues, company policy documents, legal clause libraries, technical manuals, and FAQ knowledge bases. For content that changes in real time (live prices, current inventory levels) use caching only for the static portions and add the dynamic data as uncached content in each request. Monitor cache expiry and implement a scheduled job to refresh the cache before it expires to avoid cold-start gaps during business hours.

Send your comments, suggestions or queries regarding this site to [email protected].