tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > LangChain > LangChain Map-Reduce Chain for Long Document Summarization

LangChain Map-Reduce Chain for Long Document Summarization

Author: Venkata Sudhakar

ShopMax India's legal and operations teams often need to summarize long supplier contracts, return policy documents, and product specification PDFs that exceed a single LLM context window. LangChain's Map-Reduce summarization chain handles this by splitting the document into chunks, summarizing each chunk independently (map), and then combining all summaries into a final condensed summary (reduce).

The map step sends each document chunk to the LLM with a summarization prompt and collects individual summaries. The reduce step passes all those summaries back to the LLM with a combining prompt to produce the final output. LangChain's load_summarize_chain with chain_type='map_reduce' automates this two-step process. You can customize both the map prompt and the reduce prompt for different use cases like legal review or product specs.

The example below loads a long text document representing a supplier agreement for ShopMax India, splits it into chunks, and summarizes it using the map-reduce chain.


It gives the following output,

The ShopMax India and Rajesh Electronics supplier agreement covers electronics supply with Net 30 payment terms, 7-day defect reporting, minimum 50-unit orders, and 5-day delivery SLA with 2% weekly penalties. ShopMax can reject up to 5% of shipments, prices are locked for 6 months, and Rajesh Electronics has exclusivity preventing supply of the same SKUs to competitors like Flipkart.

For very long documents (100+ pages), use refine chain type instead of map_reduce - it processes chunks sequentially and updates a running summary, which produces more coherent results. Set verbose=True to monitor the map and reduce steps. In production, cache the map step outputs to avoid re-summarizing unchanged chunks when documents are updated. Always test with your actual document sizes to tune chunk_size and chunk_overlap for your specific content.


 
  


  
bl  br