|
|
Self-RAG - Adaptive Retrieval with Reflection and Grounding
Author: Venkata Sudhakar
Self-RAG enables an LLM to decide whether it needs to retrieve at all, then critique its own retrieved documents and generated answer before returning a response. ShopMax India benefits from this for a mixed-intent chatbot that handles both product queries (retrieval needed) and general questions like 'how do I clean my laptop screen?' (retrieval not needed). Self-RAG avoids unnecessary retrieval calls while still grounding product-specific answers in factual documents.
The Self-RAG loop has four steps: (1) Retrieve decision - the model classifies whether the query needs retrieval; (2) Retrieve and assess - if needed, retrieve documents and score each for relevance; (3) Generate - produce an answer using the relevant documents; (4) Critique - the model scores its own answer for groundedness (is it supported by the retrieved text?) and utility (does it actually answer the question?). If scores are low, the loop retries with a reformulated query.
The following example implements a simplified Self-RAG loop for ShopMax India. The pipeline decides whether to retrieve, generates an answer, then scores it for groundedness before returning to the customer.
It gives the following output,
Mode: DIRECT | Groundedness: 1.0
Q: How do I clean a laptop screen safely?
A: Use a dry microfibre cloth in circular motions. For stubborn marks, lightly dampen with distilled water. Never use alcohol or household cleaners.
Mode: RETRIEVED | Groundedness: 1.0
Q: What is the price of Dell XPS 15?
A: The Dell XPS 15 9530 is priced at Rs 1,35,000 and is available in Delhi and Mumbai.
Mode: RETRIEVED | Groundedness: 1.0
Q: Which Sony headphones are available in Mumbai?
A: The Sony WH-1000XM5 headphones are available in Mumbai at Rs 29,990.
For ShopMax India at production scale, implement the groundedness score as a hard threshold: if the score falls below 0.7, trigger a second retrieval with a reformulated query before responding to the customer. Use Claude Haiku for all classification and scoring steps (retrieval decision, groundedness) and reserve Claude Opus only for the final answer generation step. This tiered approach delivers Self-RAG quality improvements while keeping per-query costs at roughly 1.3x the cost of a standard RAG call.
|
|