In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > RAG Pipelines > Contextual Compression in RAG - Extracting Relevant Passages

Contextual Compression in RAG - Extracting Relevant Passages

Author: Venkata Sudhakar

Contextual compression extracts only the relevant portion of each retrieved document before passing it to the LLM, rather than sending the entire chunk. ShopMax India product documents can be 500-1000 words long, covering specs, pricing, availability, warranty, and reviews. When a customer asks only about the return policy, sending the full spec sheet wastes tokens and dilutes the answer. Contextual compression trims each retrieved document to just the sentences relevant to the query.

LangChain provides a ContextualCompressionRetriever that wraps any base retriever with a document compressor. The LLMChainExtractor compressor sends each document plus the original query to an LLM with instructions to extract only the relevant content. This adds one LLM call per retrieved document but typically reduces context size by 60-80%, allowing more documents to fit within the context window and reducing answer generation cost.

The following example applies contextual compression to ShopMax India product documents. The compressor uses Claude Haiku to extract relevant sentences from each retrieved document before the main answer generation step.

import anthropic
from rank_bm25 import BM25Okapi

client = anthropic.Anthropic(api_key="sk-ant-...")

product_docs = [
    """Dell XPS 15 9530 Laptop - Full Specification
Processor: Intel Core i7-13700H, 14 cores.
RAM: 32GB DDR5 at 4800MHz. Expandable to 64GB.
Storage: 1TB NVMe SSD. Secondary slot available.
Display: 15.6 inch OLED, 3456x2160, 120Hz.
Battery: 86Wh, 6-8 hours typical.
Warranty: 1 year onsite hardware warranty. Battery warranty: 2 years.
Return Policy: 10-day return window from delivery date. Original packaging required.
Price: Rs 135000. Available in Mumbai, Bangalore, Delhi, Hyderabad.
Weight: 1.86kg. Dimensions: 344 x 230 x 18mm.""",
    """Sony WH-1000XM5 Headphones - Full Specification
Driver: 30mm dynamic. Frequency: 4Hz-40000Hz.
Battery: 30 hours ANC on. 3-hour quick charge gives 3 hours playback.
Connectivity: Bluetooth 5.2, NFC pairing, USB-C.
Noise Cancellation: Industry-leading ANC with 8 microphones.
Return Policy: 7-day return window. Opened box returns allowed.
Warranty: 1 year manufacturer warranty. Extended warranty available at Rs 2000.
Price: Rs 29990. Available in Mumbai and Bangalore.
Weight: 250g. Colors: Black, Platinum Silver."""
]
tokenized = [doc.lower().split() for doc in product_docs]
bm25 = BM25Okapi(tokenized)

def compress_document(query, document):
    msg = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=150,
        messages=[{"role": "user", "content": f"Extract only the sentences from this document that are relevant to the query. Return nothing if nothing is relevant.\nQuery: {query}\nDocument:\n{document}"}]
    )
    return msg.content[0].text.strip()

def compressed_rag(query, top_k=2):
    scores = bm25.get_scores(query.lower().split())
    idx = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
    compressed = []
    for i in idx:
        c = compress_document(query, product_docs[i])
        if c and len(c) > 20:
            compressed.append(c)
    context = "\n".join(compressed)
    msg = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=200,
        system="You are ShopMax India assistant. Answer using only the provided context.",
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}]
    )
    return msg.content[0].text, context

queries = [
    "What is the return policy for the Dell laptop?",
    "How long does Sony headphone battery last?"
]

for q in queries:
    answer, ctx = compressed_rag(q)
    print(f"Q: {q}")
    print(f"Compressed context ({len(ctx)} chars): {ctx[:100]}...")
    print(f"A: {answer}")
    print()

It gives the following output,

Q: What is the return policy for the Dell laptop?
Compressed context (87 chars): Return Policy: 10-day return window from delivery date. Original packaging required....
A: The Dell XPS 15 9530 has a 10-day return window from the delivery date. Original packaging is required for the return.

Q: How long does Sony headphone battery last?
Compressed context (82 chars): Battery: 30 hours ANC on. 3-hour quick charge gives 3 hours playback....
A: The Sony WH-1000XM5 battery lasts 30 hours with ANC enabled. A 3-hour quick charge provides an additional 3 hours of playback.

For ShopMax India, apply contextual compression primarily on long product specification documents where queries are likely to target specific attributes. Short FAQ documents (under 200 words) benefit less from compression - the overhead of an extra LLM call outweighs the savings. Monitor compression ratio per document type and set a minimum compressed length threshold (around 20 characters) to skip the answer generation step entirely when no relevant content is found, which further reduces hallucination risk.

Send your comments, suggestions or queries regarding this site to [email protected].