|
|
Retrieval-Augmented Prompting - Dynamic Context Injection
Author: Venkata Sudhakar
Retrieval-augmented prompting dynamically injects relevant context into the prompt at request time, rather than embedding all possible knowledge in the system prompt. At ShopMax India, a customer asking about return policy for a specific product category gets a prompt that includes only the relevant policy section retrieved from a knowledge base - keeping prompts focused and token costs low.
The pattern works in three steps: retrieve relevant documents or chunks based on the query (using keyword search, vector similarity, or a lookup table), inject the retrieved content into the prompt as context, then ask the LLM to answer using only the provided context. Placing retrieved context immediately before the question gives the LLM the best signal. Adding an instruction to say unknown when context is insufficient improves reliability.
The example below shows ShopMax India dynamically injecting policy context into a customer support prompt. A simple keyword lookup retrieves the relevant policy section, which is injected into the prompt before the LLM answers the customer question.
It gives the following output,
Q: What is the return window for a Samsung TV?
A: You can return your Samsung TV within 7 days of delivery for a full refund.
Large appliances like TVs require a technician inspection before return approval.
Ensure the item is in original packaging with all accessories.
Q: How long does delivery take to Bangalore?
A: Bangalore is a metro city, so standard delivery takes 3-5 business days.
Express next-day delivery is available for Rs 199 extra.
Q: Is voltage damage covered under warranty?
A: No, voltage fluctuation damage is explicitly not covered under the ShopMax
India warranty policy. Only manufacturer defects are covered.
At ShopMax India, use vector embeddings instead of keyword lookup for production retrieval - it handles paraphrased queries like send it back matching the return policy section. Chunk policies at the section level (200-500 tokens) rather than the document level for more precise retrieval. Add a fallback: if no relevant context is retrieved, route the question to a human agent rather than letting the LLM answer from its training data, which may be outdated or inaccurate for your specific policies.
|
|