tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Generative AI > LangChain > LangChain Self-Query Retriever - Natural Language Metadata Filtering

LangChain Self-Query Retriever - Natural Language Metadata Filtering

Author: Venkata Sudhakar

LangChain's Self-Query Retriever allows ShopMax India to filter product search results using natural language queries that include metadata conditions. Instead of building separate filter UI components, customers can type queries like 'show me laptops under Rs 50000 with 16GB RAM' and the retriever automatically extracts the filters and applies them to the vector store.

The Self-Query Retriever uses an LLM to parse the query into two parts: a semantic search string and a metadata filter. You define the metadata fields (attributes) available for filtering, and the LLM translates natural language conditions into structured filter expressions that are passed to the underlying vector store. It supports operators like eq, lt, gt, in, and not.

The example below sets up a Chroma vector store with ShopMax product data and uses a Self-Query Retriever to filter by price and category from a natural language query.


It gives the following output,

Sony WH-1000XM5 noise cancelling headphones | Price: 29990

In production, define AttributeInfo carefully - clear descriptions help the LLM generate accurate filters. For Pinecone or Weaviate, the filter syntax differs from Chroma; LangChain provides backend-specific translators automatically. Test edge cases like missing metadata fields and queries with multiple filters. Enable verbose=True during development to see the parsed query and filter before they hit the vector store.


 
  


  
bl  br