In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > Google Gemini API > Gemini Long Context Window - Analysing Large Documents

Gemini Long Context Window - Analysing Large Documents

Author: Venkata Sudhakar

Gemini 2.5 Pro offers a 1 million token context window - enough to fit approximately 700 pages of text in a single API call. Instead of chunking a 300-page annual report into pieces and losing cross-document context, you can send the entire document in one call and ask questions that require understanding relationships across the full content. Revenue mentioned in the financials section can be cross-referenced with risk factors pages later - all in one coherent context. This eliminates the need for chunking, embedding, and retrieval plumbing for many document analysis use cases.

For large documents pass content via the Gemini File API - upload once, reference by URI. For smaller text documents you can inline the text directly in the contents. Gemini 2.0 Flash also supports 1 million tokens and is faster and cheaper for most document Q&A tasks - reserve Gemini 2.5 Pro for tasks requiring the deepest multi-section reasoning. Both models handle PDF natively when uploaded through the File API, making it easy to process scanned or formatted business documents without any preprocessing.

The below example shows a financial analyst using Gemini to query a company annual report - asking questions that span revenue, risk factors, EV strategy, and capital allocation simultaneously.

from google import genai
from google.genai import types

client = genai.Client(api_key="your-gemini-api-key")

# For a real PDF: upload via File API
# uploaded = client.files.upload(file=pathlib.Path("annual_report.pdf"))
# doc_part = types.Part.from_uri(file_uri=uploaded.uri, mime_type="application/pdf")
# For demo: inline text
ANNUAL_REPORT = """
TATA MOTORS LIMITED - ANNUAL REPORT 2024 (KEY EXCERPTS)

FINANCIAL HIGHLIGHTS
Consolidated Revenue: Rs 4,37,928 crore vs Rs 3,45,967 crore (FY2023) - 26.6 percent growth
EBITDA Margin: 13.3 percent vs 9.8 percent in FY2023
PAT: Rs 31,807 crore vs loss of Rs 2,690 crore in FY2023
Net Debt reduced from Rs 58,400 crore to Rs 30,200 crore
JLR Revenue: GBP 29.0 billion - 30 percent growth year on year

BUSINESS SEGMENTS
Jaguar Land Rover: Record revenue and profitability. Range Rover and Defender
demand outstrips supply. Range Rover Electric planned for 2025.

Commercial Vehicles India: Market share 42.3 percent. Delivered 1,600 electric
buses to state transport corporations. Margin pressure in H1 from commodity costs.

Passenger Vehicles India: Record volumes of 5.7 lakh units. Nexon EV leads with
28 percent share in electric PV. Punch crosses 1 lakh monthly sales milestone.

RISK FACTORS
Geopolitical uncertainty in JLR export markets. EV transition pace uncertainty.
Currency risk: GBP/USD movements impact JLR margins. Commodity price exposure.

OUTLOOK
Management guides 10-12 percent revenue growth for FY2025.
JLR targeting 15 percent EBIT margin by FY2026.
Capex of Rs 28,000 crore planned - majority for EV platforms.
"""

SYSTEM = (
    "You are a senior equity research analyst covering the Indian auto sector. "
    "Analyse reports thoroughly with precise data-backed insights. "
    "Always cite specific figures. Never speculate beyond the document."
)

def ask_analyst(question: str) -> str:
    resp = client.models.generate_content(
        model="gemini-2.0-flash",
        config=types.GenerateContentConfig(
            system_instruction=SYSTEM,
            max_output_tokens=350, temperature=0.1
        ),
        contents=[ANNUAL_REPORT + "\n\nQuestion: " + question]
    )
    return resp.text

Asking cross-document questions that span revenue, risk and strategy sections,

It gives the following output,

Q: Summarise key financial improvements FY2023 to FY2024 with figures
A: Revenue grew 26.6% to Rs 4,37,928 crore. EBITDA margin expanded 350bps
   to 13.3%. Most strikingly, PAT swung from a Rs 2,690 crore loss to a
   Rs 31,807 crore profit. Net debt nearly halved from Rs 58,400 crore to
   Rs 30,200 crore, improving balance sheet quality significantly. JLR
   drove much of this with 30% revenue growth to GBP 29 billion.

Q: Top risks to JLR earnings? How material?
A: Three material risks: (1) Currency - JLR earns in GBP/USD but costs
   are UK-based; a 5% GBP appreciation materially hurts margins.
   (2) EV transition pace - Range Rover Electric launching 2025 into an
   uncertain charging infrastructure environment. (3) Geopolitical -
   Russia exit complete but broader export market uncertainty remains.
   All three are acknowledged but no quantified financial impact given.

Q: Is Rs 28,000 crore capex adequate for the EV opportunity?
A: Potentially stretched. With Nexon EV holding only 28% of a nascent
   market expected to reach 5-7% PV penetration, and JLR EV launch
   imminent, Rs 28,000 crore across the entire business suggests EV
   investment is being balanced against ongoing ICE platform maintenance.
   Management has not broken out EV vs ICE capex split - a gap in disclosure.

Q: BUY, HOLD or SELL based on this report?
A: BUY - with three caveats. (1) Turnaround is real: PAT swing of Rs 34,497
   crore and debt reduction of Rs 28,200 crore in one year is exceptional.
   (2) EV position in India is strong with market leadership in Nexon EV.
   (3) JLR margin trajectory toward 15% EBIT is credible given FY2024 run-rate.
   Caveats: currency exposure, EV capex disclosure gap, and commodity risk.

# All answers reference specific numbers from across the full document
# No chunking, no retrieval - entire document in one coherent context

Long context use cases where Gemini excels: full annual report analysis, entire contracts (ask "what are all the termination clauses?"), multi-chapter policy documents, complete chat history analysis (thousands of support tickets), and codebase review. The key advantage over RAG is that Gemini can find and connect information that appears in different sections of the document simultaneously - something retrieval-based approaches miss when the relevant chunks are retrieved in isolation. Use long context when your questions genuinely require understanding the whole document; use RAG when you have thousands of documents and only need the relevant few per query.

Send your comments, suggestions or queries regarding this site to [email protected].