tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > RAG Pipelines > What is Retrieval-Augmented Generation (RAG)

What is Retrieval-Augmented Generation (RAG)

Author: Venkata Sudhakar

Retrieval-Augmented Generation (RAG) is an architecture pattern for building AI applications that combines the reasoning ability of a Large Language Model with the ability to retrieve and use information from your own private documents, databases, or knowledge bases. A plain LLM only knows what it learned during training - it cannot answer questions about your company's internal policies, a product manual published last month, or real-time data. RAG solves this by retrieving relevant information at query time and providing it to the LLM as context in the prompt.

The RAG pipeline has two main phases. The indexing phase (offline, run once) takes your source documents, splits them into smaller chunks, converts each chunk into an embedding vector using an embedding model, and stores those vectors in a vector database. The retrieval and generation phase (online, at query time) takes the user's question, converts it to a vector using the same embedding model, searches the vector database for the most semantically similar document chunks, and then passes those chunks together with the user's question to the LLM to generate a grounded, accurate answer. The LLM never makes up facts because all the relevant facts are supplied in the prompt.

The below example shows the complete RAG pipeline using Python: ingesting documents, creating embeddings, storing them in a ChromaDB vector database, and then querying with semantic search followed by LLM generation.


It gives the following output,

Indexed 5 documents into ChromaDB.

The below example shows the retrieval and generation phase - taking a user question, finding relevant chunks, and generating a grounded answer.


It gives the following output,

Q: Can I return a product after 30 days?
Retrieved chunks:
  [1] Our refund policy allows returns within 30 days of purchase with original receipt.
  [2] We offer a 14-day free trial with no credit card required for all plans.
  [3] Premium plan customers get 24/7 phone support and a dedicated account manager.
A: Our refund policy only allows returns within 30 days of purchase with the original receipt. Returns after 30 days are not covered by our policy.

Q: What is the API rate limit for Pro accounts?
Retrieved chunks:
  [1] The API rate limit is 1000 requests per minute for Pro accounts.
  [2] Premium plan customers get 24/7 phone support and a dedicated account manager.
  [3] Our refund policy allows returns within 30 days of purchase with original receipt.
A: The API rate limit for Pro accounts is 1000 requests per minute.

Q: Do I need a credit card for the trial?
Retrieved chunks:
  [1] We offer a 14-day free trial with no credit card required for all plans.
  [2] Our refund policy allows returns within 30 days of purchase with original receipt.
  [3] The API rate limit is 1000 requests per minute for Pro accounts.
A: No, you do not need a credit card to start the free trial. We offer a 14-day free trial with no credit card required for all plans.

Why RAG beats fine-tuning for knowledge-based Q&A:

Fine-tuning a model on your documents embeds knowledge into the model weights, which then becomes stale as your documents change. RAG always retrieves from the latest version of your document store, making it ideal for dynamic knowledge bases. RAG is also transparent - you can show users exactly which source chunks were retrieved to generate the answer, providing citations and explainability that fine-tuning cannot offer.


 
  


  
bl  br