|
|
Gemini Structured Output with Pydantic
Author: Venkata Sudhakar
Gemini structured output guarantees the model returns valid JSON matching your exact schema every time. Instead of asking Gemini to "please return JSON" and then hoping the output is parseable, you pass a response_schema and Gemini is constrained to produce only JSON that matches it. No markdown fences, no extra explanation text, no missing fields - just clean structured data ready to write directly into your database. This is the right approach for any data extraction pipeline where downstream code depends on consistent field names and types. You define the schema as a Pydantic model or as a plain Python dict that follows the JSON Schema specification. Pass it in GenerateContentConfig as response_schema alongside response_mime_type set to "application/json". Gemini enforces the schema at the generation level, not as a post-processing step, so you never get a valid JSON parse error at runtime. Combined with Gemini Flash speed and pricing, this makes it practical to run structured extraction on thousands of documents per day as a real-time pipeline. The below example shows a procurement team automating extraction of product details from unstructured supplier emails - pulling out product name, SKU, price, availability, and lead time into a clean record for their inventory system.
Extracting structured data from three different supplier email styles,
It gives the following output,
=== EXTRACTED PRODUCT QUOTES ===
Quote 1:
Product: 20mm HDPE Pipe
SKU: HDPE-20-100
Price: Rs 145.0 per unit
MOQ: 500 units
Lead time: 7 days
Availability: IN_STOCK
Supplier: Sharma Plastics Pvt Ltd
Quote valid: 15 days
Notes: 8500 metres available for immediate dispatch
Quote 2:
Product: Stainless Steel Fasteners M8x30
SKU: SS-M8-30-100
Price: Rs 8.5 per unit
MOQ: 1000 units
Lead time: 21 days
Availability: OUT_OF_STOCK
Supplier: Allied Fasteners Mumbai
Quote valid: 30 days
Notes: Next batch expected in 21 days
Quote 3:
Product: 3-phase Electric Motor 2HP
SKU: not specified
Price: Rs 12400.0 per unit
MOQ: 1 units
Lead time: 0 days
Availability: LOW_STOCK
Supplier: Electro Traders Pune
Quote valid: not specified
Notes: Only 3 units left, same-day dispatch before 2pm
# Three completely different email styles - all produce identical schema
# Zero parsing errors guaranteed - schema enforced at generation level
# Ready to INSERT directly into your procurement database
Structured output use cases that run well in production: extracting contact details from business cards and email signatures, parsing purchase orders into line-item records, classifying and tagging support tickets with severity and category fields, extracting patient details from referral letters (with appropriate privacy controls), pulling financial figures from earnings announcements, and converting unstructured meeting notes into action item records. The Pydantic model serves double duty as both the schema definition and the validated Python object you write to your database - no separate validation step needed.
|
|