In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > LangChain > Building a RAG Chain with LangChain

Building a RAG Chain with LangChain

Author: Venkata Sudhakar

LangChain provides dedicated abstractions that make building RAG pipelines significantly cleaner than assembling the pieces manually. The key components are document loaders (load data from PDFs, web pages, databases), text splitters (chunk documents), vector stores (persist and query embeddings), retrievers (interface over a vector store for similarity search), and the final RAG chain that wires everything together. LangChain's LCEL makes it easy to compose these into a single, streaming-capable pipeline.

The LangChain RAG pattern follows a standard structure: build the retriever from a vector store, define a prompt that accepts context and a question, then chain the retriever output into the prompt and pass it to the LLM. The RetrievalQA chain and the newer create_retrieval_chain function handle this plumbing automatically. For production systems, you would use a persistent vector store like Chroma with disk storage, Pinecone, pgvector, or Weaviate instead of an in-memory store.

The below example shows the complete end-to-end LangChain RAG pipeline from document ingestion to question answering, built with LCEL for full control over the retrieval and generation steps.

# pip install langchain langchain-openai langchain-community chromadb
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document

# ---- INDEXING PHASE ----

# Source documents (in practice use PyPDFLoader, WebBaseLoader, etc.)
raw_docs = [
    Document(page_content="CDC captures database changes by reading the transaction log. Debezium is the most popular open-source CDC tool and publishes events to Kafka.", metadata={"source": "cdc_guide.pdf", "page": 1}),
    Document(page_content="The Strangler Fig pattern migrates legacy systems incrementally. A proxy routes traffic to the old or new system based on feature flags.", metadata={"source": "migration_guide.pdf", "page": 3}),
    Document(page_content="Blue-Green deployment uses two identical environments. Traffic switches from Blue (old) to Green (new) via a load balancer after validation.", metadata={"source": "migration_guide.pdf", "page": 5}),
    Document(page_content="Flyway manages schema migrations with versioned SQL scripts. Each script runs exactly once and is tracked in the flyway_schema_history table.", metadata={"source": "schema_guide.pdf", "page": 1}),
    Document(page_content="Zero-downtime migration requires CDC to sync both old and new databases during the transition period before cutover.", metadata={"source": "migration_guide.pdf", "page": 7}),
]

# Split documents into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=30)
chunks = splitter.split_documents(raw_docs)

# Embed and store in ChromaDB (persisted to disk)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key="your-api-key-here")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"  # Persists to disk for reuse
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

print(f"Indexed {len(chunks)} chunks into ChromaDB.")

It gives the following output,

Indexed 5 chunks into ChromaDB.

The below example shows the LCEL RAG chain for question answering, with source citation and streaming support.

# ---- RAG CHAIN (LCEL) ----

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key="your-api-key-here")

rag_prompt = ChatPromptTemplate.from_template(
"""You are a helpful assistant for data migration and AI topics.
Answer the question using ONLY the context provided below.
Cite the source document for each fact you use.
If the answer is not in the context, say: "I don't have information on that topic."

Context:
{context}

Question: {question}
Answer:"""
)

# Helper: format retrieved documents into a readable context block
def format_docs(docs):
    return "\n\n".join(
        f"[Source: {d.metadata.get('source', 'unknown')} p.{d.metadata.get('page', '?')}]\n{d.page_content}"
        for d in docs
    )

# LCEL RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Ask questions
questions = [
    "How does CDC work and which tool publishes events to Kafka?",
    "What is the difference between Blue-Green and Strangler Fig migration?",
    "How does Flyway track which migrations have been applied?"
]

for q in questions:
    print(f"Q: {q}")
    answer = rag_chain.invoke(q)
    print(f"A: {answer}\n" + "-"*60 + "\n")

It gives the following output,

Q: How does CDC work and which tool publishes events to Kafka?
A: CDC works by reading the database transaction log to capture changes.
[Source: cdc_guide.pdf p.1] Debezium is the most popular open-source CDC
tool and publishes these change events to Apache Kafka.
------------------------------------------------------------

Q: What is the difference between Blue-Green and Strangler Fig migration?
A: Blue-Green deployment [Source: migration_guide.pdf p.5] uses two identical
environments where traffic switches from the old (Blue) environment to the
new (Green) one via a load balancer after validation. The Strangler Fig pattern
[Source: migration_guide.pdf p.3] migrates incrementally by routing traffic
to old or new systems based on feature flags, replacing legacy features one at
a time.
------------------------------------------------------------

Q: How does Flyway track which migrations have been applied?
A: Flyway tracks migrations using the flyway_schema_history table
[Source: schema_guide.pdf p.1], which records each versioned SQL script
that has been successfully applied, ensuring each script runs exactly once.
------------------------------------------------------------

LangChain RAG enhancements for production:

Conversational memory - Use ConversationBufferMemory or LangGraph to maintain chat history, allowing follow-up questions like "Tell me more about that" to work correctly in multi-turn conversations.

Re-ranking - After vector retrieval, use a cross-encoder re-ranker (like Cohere Rerank or a local cross-encoder model) to re-score the top-k chunks by relevance to the exact question before passing them to the LLM. This significantly improves answer quality.

Hybrid search - Combine vector similarity search with keyword BM25 search. Semantic search excels at conceptual queries; keyword search excels at exact term matching. The ensemble retriever in LangChain combines both for better coverage.

Send your comments, suggestions or queries regarding this site to [email protected].