|
|
LangChain Map-Reduce Chain for Long Document Summarization
Author: Venkata Sudhakar
ShopMax India's legal and operations teams often need to summarize long supplier contracts, return policy documents, and product specification PDFs that exceed a single LLM context window. LangChain's Map-Reduce summarization chain handles this by splitting the document into chunks, summarizing each chunk independently (map), and then combining all summaries into a final condensed summary (reduce).
The map step sends each document chunk to the LLM with a summarization prompt and collects individual summaries. The reduce step passes all those summaries back to the LLM with a combining prompt to produce the final output. LangChain's load_summarize_chain with chain_type='map_reduce' automates this two-step process. You can customize both the map prompt and the reduce prompt for different use cases like legal review or product specs.
The example below loads a long text document representing a supplier agreement for ShopMax India, splits it into chunks, and summarizes it using the map-reduce chain.
It gives the following output,
The ShopMax India and Rajesh Electronics supplier agreement covers electronics supply with Net 30 payment terms, 7-day defect reporting, minimum 50-unit orders, and 5-day delivery SLA with 2% weekly penalties. ShopMax can reject up to 5% of shipments, prices are locked for 6 months, and Rajesh Electronics has exclusivity preventing supply of the same SKUs to competitors like Flipkart.
For very long documents (100+ pages), use refine chain type instead of map_reduce - it processes chunks sequentially and updates a running summary, which produces more coherent results. Set verbose=True to monitor the map and reduce steps. In production, cache the map step outputs to avoid re-summarizing unchanged chunks when documents are updated. Always test with your actual document sizes to tune chunk_size and chunk_overlap for your specific content.
|
|