In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > LangChain > LangChain Map-Reduce Chain for Long Document Summarization

LangChain Map-Reduce Chain for Long Document Summarization

Author: Venkata Sudhakar

ShopMax India's legal and operations teams often need to summarize long supplier contracts, return policy documents, and product specification PDFs that exceed a single LLM context window. LangChain's Map-Reduce summarization chain handles this by splitting the document into chunks, summarizing each chunk independently (map), and then combining all summaries into a final condensed summary (reduce).

The map step sends each document chunk to the LLM with a summarization prompt and collects individual summaries. The reduce step passes all those summaries back to the LLM with a combining prompt to produce the final output. LangChain's load_summarize_chain with chain_type='map_reduce' automates this two-step process. You can customize both the map prompt and the reduce prompt for different use cases like legal review or product specs.

The example below loads a long text document representing a supplier agreement for ShopMax India, splits it into chunks, and summarizes it using the map-reduce chain.

from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

long_text = """
ShopMax India Supplier Agreement - Rajesh Electronics Pvt Ltd, Mumbai
This agreement governs the supply of consumer electronics products including
smartphones, laptops, headphones, and accessories to ShopMax India.
Payment terms: Net 30 days from invoice date.
Return policy: Defective items must be reported within 7 days of receipt.
Minimum order quantity: 50 units per SKU per order.
Delivery SLA: 5 business days to ShopMax warehouse in Pune.
Penalty clause: 2 percent penalty per week for delayed delivery beyond SLA.
Quality check: ShopMax reserves the right to reject up to 5 percent of any shipment.
Price revision: Prices locked for 6 months from agreement date.
Exclusive territory: Rajesh Electronics may not supply the same SKUs to Flipkart.
""" * 5

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.create_documents([long_text])

chain = load_summarize_chain(llm, chain_type="map_reduce")
summary = chain.invoke(docs)
print(summary["output_text"])

It gives the following output,

The ShopMax India and Rajesh Electronics supplier agreement covers electronics supply with Net 30 payment terms, 7-day defect reporting, minimum 50-unit orders, and 5-day delivery SLA with 2% weekly penalties. ShopMax can reject up to 5% of shipments, prices are locked for 6 months, and Rajesh Electronics has exclusivity preventing supply of the same SKUs to competitors like Flipkart.

For very long documents (100+ pages), use refine chain type instead of map_reduce - it processes chunks sequentially and updates a running summary, which produces more coherent results. Set verbose=True to monitor the map and reduce steps. In production, cache the map step outputs to avoid re-summarizing unchanged chunks when documents are updated. Always test with your actual document sizes to tune chunk_size and chunk_overlap for your specific content.

Send your comments, suggestions or queries regarding this site to [email protected].