|
|
Building an LLM Observability Dashboard with Grafana and Prometheus
Author: Venkata Sudhakar
ShopMax India's LLM-powered features serve thousands of requests per hour. Operations teams need a centralised view of LLM API health - request rates, error rates, token usage, and latency percentiles. Prometheus collects these metrics from the Python application via an HTTP endpoint, and Grafana visualises them on a live dashboard. This standard observability stack integrates naturally with ShopMax India's existing infrastructure monitoring.
The Python application exposes a /metrics endpoint using the prometheus-client library. Custom counters and histograms track LLM-specific signals: total requests by feature, token usage by model, completion latency distribution, and error counts. Prometheus scrapes this endpoint every 15 seconds. Grafana queries Prometheus and renders metrics as time-series panels, gauges, and heatmaps.
The example below shows a FastAPI service that instruments LLM calls with Prometheus metrics and exposes a /metrics endpoint for scraping by the Prometheus server.
It gives the following output,
# HELP llm_requests_total Total LLM API calls
# TYPE llm_requests_total counter
llm_requests_total{feature="recommend",model="gpt-4o",status="success"} 142.0
# HELP llm_latency_seconds LLM completion latency
# TYPE llm_latency_seconds histogram
llm_latency_seconds_bucket{feature="recommend",le="1.0"} 89.0
llm_latency_seconds_bucket{feature="recommend",le="2.0"} 131.0
# HELP llm_tokens_total Total tokens consumed
# TYPE llm_tokens_total counter
llm_tokens_total{feature="recommend",type="prompt"} 18450.0
llm_tokens_total{feature="recommend",type="completion"} 4230.0
Add a Grafana alert when p95 latency exceeds 3 seconds or the error rate exceeds 1 percent. Use the feature label to create per-feature cost dashboards by multiplying completion tokens by the model price per token. Add a daily token budget panel that fires an alert when 80 percent of the budget is consumed. Version Grafana dashboard JSON in Git alongside the application code so dashboard changes are reviewed and audited.
|
|