In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > AI Security > Secure Multi-Tenant LLM Deployments with Tenant Isolation

Secure Multi-Tenant LLM Deployments with Tenant Isolation

Author: Venkata Sudhakar

ShopMax India operates a shared LLM platform used by its retail, B2B, and marketplace divisions. Each division is a separate tenant with its own system prompt, knowledge base, and data access permissions. In a shared LLM infrastructure, strict tenant isolation prevents one tenant's data from leaking into another tenant's responses, ensures system prompt confidentiality, and enforces per-tenant rate limits and spend caps.

Tenant isolation is enforced at three layers: the request routing layer tags every request with a tenant ID and validates it against a registry; the prompt injection layer prepends a tenant-specific system prompt that scopes the LLM's knowledge; and the output filtering layer scans responses for cross-tenant data signals before returning them. Each layer is stateless and can be deployed as FastAPI middleware without modifying the core LLM call logic.

The example below implements three-layer tenant isolation middleware for ShopMax India's shared LLM platform, demonstrating both isolation enforcement and cross-tenant content filtering.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import openai
import os

app = FastAPI()
client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY", ""))

TENANTS = {
    "retail": {
        "system_prompt": "You are a ShopMax India retail assistant. Only discuss consumer electronics.",
        "forbidden": ["B2B", "wholesale", "marketplace commission"]
    },
    "b2b": {
        "system_prompt": "You are a ShopMax India B2B procurement assistant. Only discuss bulk orders.",
        "forbidden": ["consumer discount", "EMI", "home delivery"]
    }
}

class ChatRequest(BaseModel):
    message: str

def validate_tenant(tenant_id):
    if tenant_id not in TENANTS:
        raise HTTPException(status_code=403, detail="Unknown tenant")
    return TENANTS[tenant_id]

def filter_output(text, forbidden):
    for kw in forbidden:
        if kw.lower() in text.lower():
            return "[Response filtered: cross-tenant content detected]"
    return text

@app.post("/chat/{tenant_id}")
async def chat(tenant_id: str, req: ChatRequest):
    tenant = validate_tenant(tenant_id)
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": tenant["system_prompt"]},
            {"role": "user", "content": req.message}
        ]
    )
    output = resp.choices[0].message.content
    safe_output = filter_output(output, tenant["forbidden"])
    return {"tenant": tenant_id, "response": safe_output}

It gives the following output,

POST /chat/retail {"message": "Best laptops for home use?"}
{
  "tenant": "retail",
  "response": "For home use I recommend the HP Pavilion 15 (Rs 54,990)
               and the Lenovo IdeaPad Slim 5 (Rs 49,990)."
}

POST /chat/retail {"message": "Tell me about wholesale pricing"}
{
  "tenant": "retail",
  "response": "[Response filtered: cross-tenant content detected]"
}

Use JWT tokens to authenticate tenant requests rather than relying on URL path parameters alone - a tenant ID in the URL can be spoofed. Store per-tenant system prompts in an encrypted secrets store such as Google Cloud Secret Manager. Add per-tenant token usage counters and enforce hard caps to prevent one tenant from consuming the shared quota. Log all filter events with tenant ID, request ID, and the matched keyword for security auditing and incident response.

Send your comments, suggestions or queries regarding this site to [email protected].