In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > RAG Pipelines > RAG with LlamaIndex - Document Loaders and Vector Index Types

RAG with LlamaIndex - Document Loaders and Vector Index Types

Author: Venkata Sudhakar

LlamaIndex provides a high-level RAG framework that handles document loading, chunking, embedding, indexing, and querying with minimal boilerplate code. ShopMax India can use LlamaIndex to rapidly prototype and deploy RAG pipelines without writing low-level vector store and retrieval code. LlamaIndex's document loaders support PDFs, CSVs, web pages, and databases - covering all the product documentation sources ShopMax India uses.

LlamaIndex organizes RAG into three building blocks: Document (raw input), Node (chunked unit stored in the index), and Index (the searchable data structure). The VectorStoreIndex is the most common index type and uses embeddings for semantic search. The query engine wraps the index with retrieval and synthesis logic, returning answers with source attribution. LlamaIndex integrates with Anthropic Claude as the LLM and OpenAI or HuggingFace for embeddings via its Settings object.

The following example builds a LlamaIndex RAG pipeline for ShopMax India product documentation. Documents are loaded from text, indexed with Claude as the LLM, and queried through LlamaIndex's query engine with automatic source citation.

from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.llms.anthropic import Anthropic
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.llm = Anthropic(model="claude-opus-4-7", api_key="sk-ant-...")
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.chunk_size = 256
Settings.chunk_overlap = 20

product_texts = [
    """Sony WH-1000XM5 Headphones
Battery: 30 hours ANC on, 40 hours ANC off.
Price: Rs 29990. Available in Mumbai and Bangalore.
Warranty: 1 year manufacturer warranty.
Return policy: 7-day return, opened box accepted.""",
    """Samsung Galaxy S24 Ultra
Camera: 200MP main, 12MP ultrawide.
RAM: 12GB. Storage: 256GB or 512GB.
Price: Rs 134999. Pan-India delivery in 2 days.
Warranty: 1 year Samsung warranty.""",
    """Dell XPS 15 9530 Laptop
Processor: Intel Core i7-13700H.
RAM: 32GB DDR5. Storage: 1TB NVMe SSD.
Price: Rs 135000. Available in Delhi, Mumbai, Bangalore.
Warranty: 1 year onsite. Battery: 2 years."""
]

documents = [Document(text=t, metadata={"source": f"product_{i}"}) for i, t in enumerate(product_texts)]
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=2)

queries = [
    "What is the return policy for Sony headphones?",
    "Which products are available in Delhi?",
    "What camera specs does the Samsung Galaxy S24 Ultra have?"
]

for q in queries:
    response = query_engine.query(q)
    print(f"Q: {q}")
    print(f"A: {response}")
    print()

It gives the following output,

Q: What is the return policy for Sony headphones?
A: The Sony WH-1000XM5 has a 7-day return policy and accepts opened box returns.

Q: Which products are available in Delhi?
A: The Dell XPS 15 9530 laptop is available in Delhi, Mumbai, and Bangalore. Pan-India delivery is available for the Samsung Galaxy S24 Ultra, which also covers Delhi.

Q: What camera specs does the Samsung Galaxy S24 Ultra have?
A: The Samsung Galaxy S24 Ultra has a 200MP main camera and a 12MP ultrawide camera.

For ShopMax India, use LlamaIndex's SimpleDirectoryReader to automatically load and index all product PDF spec sheets from a folder, eliminating manual document preparation. Switch to ChromaVectorStore for persistent storage so the index survives service restarts. Enable response_mode='compact' in the query engine to reduce token usage by merging retrieved nodes before synthesis. Use the RetrieverQueryEngine with a custom postprocessor to apply metadata filters - for example, restricting results to products available in the customer's city based on their profile.

Send your comments, suggestions or queries regarding this site to [email protected].