|
|
Securing LLM APIs - Rate Limiting and Authentication with FastAPI
Author: Venkata Sudhakar
Exposing an LLM-powered API without proper security controls invites abuse - from rate limit exhaustion that drives up OpenAI costs to unauthorized access that leaks customer data. ShopMax India's AI product search and recommendation API, used by millions of customers across Hyderabad, Chennai, and Mumbai, must enforce API key authentication and per-key rate limiting to ensure fair use and prevent denial-of-service scenarios caused by runaway clients or scrapers.
FastAPI provides a clean foundation for building secure LLM APIs. API key authentication is implemented as a dependency that reads the X-API-Key header and validates it against a registry. Rate limiting tracks request counts per key using an in-memory store (or Redis in production) with a sliding window. When a key exceeds its quota, the API returns HTTP 429 Too Many Requests. Each API key can have its own tier - for example, ShopMax India's mobile app gets 1000 requests per minute while third-party integrations get 100.
The following example builds a FastAPI application with API key auth and per-key rate limiting that fronts a call to an LLM for product recommendations. It demonstrates key validation, quota tracking, and graceful rejection with informative error messages.
It gives the following output,
# Valid request
GET /recommend?product=OnePlus+12&city=Hyderabad
X-API-Key: shopmax-mobile-v1
{
"product": "OnePlus 12",
"city": "Hyderabad",
"recommendations": "1. Sandstone case Rs 499\n2. 80W charger Rs 1,299\n3. Screen protector Rs 299"
}
# Invalid key
{"detail": "Invalid API key"}
# Rate limit exceeded
{"detail": "Rate limit exceeded. Try again in 60 seconds."}
In production, replace the in-memory request_counts dict with Redis using a sorted set per key - this survives restarts and works across multiple API server instances. Add JWT-based auth for ShopMax India's internal services to avoid sharing static keys. Store API key metadata (owner, creation date, last used) in a database so you can revoke compromised keys instantly. Monitor the rate limit hit rate in Grafana - a key consistently hitting its quota is a signal either the quota is too low or the client has a bug making redundant calls.
|
|