|
|
Contextual Compression in RAG - Extracting Relevant Passages
Author: Venkata Sudhakar
Contextual compression extracts only the relevant portion of each retrieved document before passing it to the LLM, rather than sending the entire chunk. ShopMax India product documents can be 500-1000 words long, covering specs, pricing, availability, warranty, and reviews. When a customer asks only about the return policy, sending the full spec sheet wastes tokens and dilutes the answer. Contextual compression trims each retrieved document to just the sentences relevant to the query.
LangChain provides a ContextualCompressionRetriever that wraps any base retriever with a document compressor. The LLMChainExtractor compressor sends each document plus the original query to an LLM with instructions to extract only the relevant content. This adds one LLM call per retrieved document but typically reduces context size by 60-80%, allowing more documents to fit within the context window and reducing answer generation cost.
The following example applies contextual compression to ShopMax India product documents. The compressor uses Claude Haiku to extract relevant sentences from each retrieved document before the main answer generation step.
It gives the following output,
Q: What is the return policy for the Dell laptop?
Compressed context (87 chars): Return Policy: 10-day return window from delivery date. Original packaging required....
A: The Dell XPS 15 9530 has a 10-day return window from the delivery date. Original packaging is required for the return.
Q: How long does Sony headphone battery last?
Compressed context (82 chars): Battery: 30 hours ANC on. 3-hour quick charge gives 3 hours playback....
A: The Sony WH-1000XM5 battery lasts 30 hours with ANC enabled. A 3-hour quick charge provides an additional 3 hours of playback.
For ShopMax India, apply contextual compression primarily on long product specification documents where queries are likely to target specific attributes. Short FAQ documents (under 200 words) benefit less from compression - the overhead of an extra LLM call outweighs the savings. Monitor compression ratio per document type and set a minimum compressed length threshold (around 20 characters) to skip the answer generation step entirely when no relevant content is found, which further reduces hallucination risk.
|
|