|
|
Microsoft GraphRAG - Community Detection and Global Summarization
Author: Venkata Sudhakar
Standard RAG retrieves text chunks relevant to a specific question but struggles with broad queries like 'What are the most common product complaints across all ShopMax India orders?' because the answer is scattered across thousands of documents. Microsoft GraphRAG solves this by extracting entities and relationships from documents, building a knowledge graph, running community detection to cluster related concepts, and generating summaries at each community level. ShopMax India can then answer global questions by querying these pre-built summaries instead of scanning every document.
GraphRAG works in two modes. Local search finds specific entities and their immediate graph neighborhood - good for targeted questions like 'What issues did Rahul Sharma report?' Global search aggregates community-level summaries to answer broad thematic questions. The pipeline runs entity and relationship extraction via LLM calls, builds a graph, applies the Leiden community detection algorithm to cluster nodes, and uses the LLM again to write a natural language summary for each community. The python-louvain library provides a simpler Louvain implementation that demonstrates the same concept.
The example below simulates the GraphRAG pipeline on ShopMax India support tickets. It uses OpenAI to extract entities and relationships from five tickets, builds a networkx graph, runs community detection to group related issues, and generates a plain English summary for each community.
It gives the following output,
Communities detected in ShopMax India support graph:
Community 0: Samsung, Rahul Sharma, Mumbai, display flickering, overheating
Community 1: OnePlus 11 5G, Bangalore, battery drain
Community 2: LG washing machine, Delhi, Logistics team, delivery delay
Community 3: Sony headphones, Chennai, broken cable
Community 4: Samsung refrigerator, Hyderabad, compressor noise
Summary for Community 0:
Samsung products in Mumbai are experiencing hardware quality issues including
display flickering and overheating, suggesting a potential batch defect in
recent ShopMax India stock. Escalation to the Samsung vendor is recommended.
Summary for Community 1:
Multiple OnePlus 11 5G units sold through ShopMax India Bangalore are showing
abnormal battery drain, likely a firmware issue affecting a specific batch.
In production, replace the Louvain implementation with Microsoft GraphRAG's full pipeline (pip install graphrag) which uses the Leiden algorithm and generates multi-level community summaries automatically. GraphRAG's global query mode is expensive at scale - each query fans out across all community summaries, so cache global answers for common questions. For ShopMax India, run the extraction pipeline nightly on the previous day's support tickets and store community summaries in a vector store so global queries hit cached embeddings rather than re-running LLM summarization each time.
|
|