In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > AI Observability > Monitoring Embedding Drift in RAG Pipelines

Monitoring Embedding Drift in RAG Pipelines

Author: Venkata Sudhakar

ShopMax India uses a RAG pipeline to answer customer queries about products, pricing, and policies. Over time, the product catalog changes - new items are added, prices are updated, and discontinued products are removed. If user queries drift significantly from what the retriever was built on, retrieval quality degrades silently. Embedding drift monitoring detects this degradation before it impacts customers.

Drift is measured by comparing the cosine similarity distribution of current query embeddings against a baseline set recorded at deployment time. When the distribution shifts - average similarity drops or variance increases - it signals that user queries have changed or that the knowledge base no longer covers the topics users are asking about. ShopMax India computes this metric daily and alerts if average similarity drops by more than 10 percent from baseline.

The example below establishes a baseline similarity distribution at deployment and detects drift when a new batch of queries is evaluated against it.

import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

model = SentenceTransformer("all-MiniLM-L6-v2")

# Baseline queries recorded at deployment
baseline_queries = [
    "Best smartphone under Rs 20000",
    "Laptop for students in Mumbai",
    "Wireless earbuds with noise cancellation",
    "4K TV for living room under Rs 50000",
    "Gaming laptop with RTX 4060"
]

# Current queries from last 24 hours
current_queries = [
    "Refund policy for broken headphones",
    "Does ShopMax deliver to Hyderabad?",
    "How to track my order from Delhi warehouse",
    "EMI options for iPhone purchase",
    "Student discount on laptops"
]

baseline_embeddings = model.encode(baseline_queries)
current_embeddings = model.encode(current_queries)

# Max cosine similarity of each current query to any baseline query
sim_matrix = cosine_similarity(current_embeddings, baseline_embeddings)
avg_sim = sim_matrix.max(axis=1).mean()

DRIFT_THRESHOLD = 0.45

print(f"Average max similarity to baseline: {avg_sim:.4f}")
if avg_sim < DRIFT_THRESHOLD:
    print("DRIFT DETECTED: Query distribution has shifted significantly.")
    print("Action: Review knowledge base coverage for new query types.")
else:
    print("No significant drift detected. RAG pipeline is aligned.")

It gives the following output,

Average max similarity to baseline: 0.3812
DRIFT DETECTED: Query distribution has shifted significantly.
Action: Review knowledge base coverage for new query types.

Run drift checks daily using a scheduled job and store the average similarity score in a time-series database to track gradual trends. When drift is detected, sample the low-similarity queries and manually review whether the knowledge base covers those topics. Use the drifted query clusters to prioritise new document ingestion into the RAG pipeline. Set a secondary alert if drift persists for more than three consecutive days without remediation.

Send your comments, suggestions or queries regarding this site to [email protected].