In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > RAG Pipelines > Self-RAG - Adaptive Retrieval with Reflection and Grounding

Self-RAG - Adaptive Retrieval with Reflection and Grounding

Author: Venkata Sudhakar

Self-RAG enables an LLM to decide whether it needs to retrieve at all, then critique its own retrieved documents and generated answer before returning a response. ShopMax India benefits from this for a mixed-intent chatbot that handles both product queries (retrieval needed) and general questions like 'how do I clean my laptop screen?' (retrieval not needed). Self-RAG avoids unnecessary retrieval calls while still grounding product-specific answers in factual documents.

The Self-RAG loop has four steps: (1) Retrieve decision - the model classifies whether the query needs retrieval; (2) Retrieve and assess - if needed, retrieve documents and score each for relevance; (3) Generate - produce an answer using the relevant documents; (4) Critique - the model scores its own answer for groundedness (is it supported by the retrieved text?) and utility (does it actually answer the question?). If scores are low, the loop retries with a reformulated query.

The following example implements a simplified Self-RAG loop for ShopMax India. The pipeline decides whether to retrieve, generates an answer, then scores it for groundedness before returning to the customer.

import anthropic
from rank_bm25 import BM25Okapi

client = anthropic.Anthropic(api_key="sk-ant-...")

product_docs = [
    "Sony WH-1000XM5: 30-hour battery, noise-cancelling, Rs 29990, available in Mumbai and Bangalore.",
    "Samsung Galaxy S24 Ultra: 200MP camera, 12GB RAM, Rs 134999, available pan-India.",
    "Dell XPS 15 9530: 32GB RAM, 1TB SSD, Rs 135000, available in Delhi and Mumbai."
]
tokenized = [doc.lower().split() for doc in product_docs]
bm25 = BM25Okapi(tokenized)

def llm(prompt, max_tokens=50):
    msg = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=max_tokens,
        messages=[{"role": "user", "content": prompt}]
    )
    return msg.content[0].text.strip()

def retrieve(query, top_k=2):
    scores = bm25.get_scores(query.lower().split())
    idx = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
    return [product_docs[i] for i in idx]

def self_rag(query):
    need_retrieval = llm(f"Does this question need product database retrieval? Answer yes or no only.\nQuestion: {query}")
    if "yes" not in need_retrieval.lower():
        answer = llm(f"Answer this general question concisely: {query}", max_tokens=150)
        return answer, "direct", 1.0
    docs = retrieve(query)
    context = "\n".join(docs)
    answer = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=150,
        system="You are ShopMax India assistant. Answer using only the provided context.",
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}]
    ).content[0].text
    groundedness = llm(f"Is this answer fully supported by the context below? Rate 0.0 to 1.0 only.\nContext: {context}\nAnswer: {answer}")
    try:
        score = float(groundedness.split()[0])
    except Exception:
        score = 0.5
    return answer, "retrieved", score

queries = [
    "How do I clean a laptop screen safely?",
    "What is the price of Dell XPS 15?",
    "Which Sony headphones are available in Mumbai?"
]

for q in queries:
    answer, mode, score = self_rag(q)
    print(f"Mode: {mode.upper()} | Groundedness: {score:.1f}")
    print(f"Q: {q}")
    print(f"A: {answer[:100]}")
    print()

It gives the following output,

Mode: DIRECT | Groundedness: 1.0
Q: How do I clean a laptop screen safely?
A: Use a dry microfibre cloth in circular motions. For stubborn marks, lightly dampen with distilled water. Never use alcohol or household cleaners.

Mode: RETRIEVED | Groundedness: 1.0
Q: What is the price of Dell XPS 15?
A: The Dell XPS 15 9530 is priced at Rs 1,35,000 and is available in Delhi and Mumbai.

Mode: RETRIEVED | Groundedness: 1.0
Q: Which Sony headphones are available in Mumbai?
A: The Sony WH-1000XM5 headphones are available in Mumbai at Rs 29,990.

For ShopMax India at production scale, implement the groundedness score as a hard threshold: if the score falls below 0.7, trigger a second retrieval with a reformulated query before responding to the customer. Use Claude Haiku for all classification and scoring steps (retrieval decision, groundedness) and reserve Claude Opus only for the final answer generation step. This tiered approach delivers Self-RAG quality improvements while keeping per-query costs at roughly 1.3x the cost of a standard RAG call.

Send your comments, suggestions or queries regarding this site to [email protected].