In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > RAG Pipelines > Corrective RAG - Self-Evaluation and Query Reformulation

Corrective RAG - Self-Evaluation and Query Reformulation

Author: Venkata Sudhakar

Corrective RAG (CRAG) improves retrieval accuracy by adding a self-evaluation step before generating an answer. ShopMax India uses this pattern to catch cases where retrieved product documents are irrelevant to a customer query - for example, when a query about a Sony camera retrieves laptop specs instead. Rather than silently returning a wrong answer, CRAG detects the mismatch and either triggers a web search fallback or reformulates the query for a second retrieval attempt.

The CRAG pipeline works in three stages. First, the retriever fetches top-k documents as usual. Second, an evaluator LLM call scores each retrieved document as 'relevant', 'partially relevant', or 'irrelevant'. Third, based on the scores: if at least one document is relevant, proceed to generation; if all are irrelevant, fall back to an alternative source or return a 'not found' response rather than hallucinating. This evaluation step adds one extra LLM call per query but dramatically reduces hallucination on out-of-distribution queries.

The following example implements a CRAG pipeline for ShopMax India product queries. The evaluator scores retrieved chunks, filters out irrelevant ones, and routes to fallback when no relevant chunks remain.

import anthropic
from rank_bm25 import BM25Okapi

client = anthropic.Anthropic(api_key="sk-ant-...")

product_docs = [
    "Sony WH-1000XM5 headphones: 30-hour battery, USB-C, Rs 29990, Mumbai and Bangalore.",
    "Dell XPS 15 9530: 32GB RAM, 1TB SSD, Intel i7, Rs 135000, available pan-India.",
    "Samsung Galaxy S24: 50MP camera, 8GB RAM, Rs 79999, available in all cities."
]

tokenized_corpus = [doc.lower().split() for doc in product_docs]
bm25 = BM25Okapi(tokenized_corpus)

def evaluate_relevance(query, doc):
    msg = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=10,
        messages=[{"role": "user", "content": f"Is this document relevant to the query?\nQuery: {query}\nDocument: {doc}\nAnswer with only: relevant, partial, or irrelevant"}]
    )
    return msg.content[0].text.strip().lower()

def crag_query(query, top_k=3):
    scores = bm25.get_scores(query.lower().split())
    top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
    retrieved = [product_docs[i] for i in top_indices]
    relevant_docs = []
    for doc in retrieved:
        rating = evaluate_relevance(query, doc)
        if rating in ["relevant", "partial"]:
            relevant_docs.append(doc)
    if not relevant_docs:
        return "No relevant product information found for your query. Please contact ShopMax India support.", []
    context = "\n".join(relevant_docs)
    msg = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=200,
        system="You are ShopMax India assistant. Answer using only the provided product context.",
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}]
    )
    return msg.content[0].text, relevant_docs

queries = [
    "What is the battery life of Sony headphones?",
    "What are the cooking recipes for biryani?"
]

for q in queries:
    answer, docs = crag_query(q)
    print(f"Q: {q}")
    print(f"A: {answer}")
    print(f"Relevant docs used: {len(docs)}")
    print()

It gives the following output,

Q: What is the battery life of Sony headphones?
A: The Sony WH-1000XM5 headphones offer a 30-hour battery life and support USB-C charging.
Relevant docs used: 1

Q: What are the cooking recipes for biryani?
A: No relevant product information found for your query. Please contact ShopMax India support.
Relevant docs used: 0

For ShopMax India, use the smaller Claude Haiku model for the relevance evaluation step to keep costs low - it only needs to output one word. Reserve Claude Opus for the final answer generation. Log all queries that hit the fallback path; these reveal gaps in your product knowledge base that need new documents. Over time, corrective RAG transforms from a safety net into a data quality feedback loop that continuously improves your retrieval corpus.

Send your comments, suggestions or queries regarding this site to [email protected].