|
|
OpenTelemetry for LLM Applications - Distributed Tracing with Python
Author: Venkata Sudhakar
ShopMax India's AI platform spans multiple services - a chatbot service in Bangalore, a recommendation engine in Mumbai, and a fraud detection service in Hyderabad. When a customer query is slow, the team needs distributed traces that show how long each service took, not just the final LLM call. OpenTelemetry (OTel) is a vendor-neutral observability standard that instruments Python applications with spans, metrics, and logs that export to any backend - Jaeger, Grafana Tempo, or Datadog. For LLM applications, OTel traces every prompt, completion, embedding call, and retrieval step with standardized semantic attributes.
OTel tracing works by creating a Tracer from a TracerProvider configured with an exporter (OTLP, Jaeger, or console). Each LLM call is wrapped in a span with attributes like llm.model, llm.prompt_tokens, llm.completion_tokens, and llm.latency_ms. Child spans represent sub-operations like vector search or tool calls. The opentelemetry-instrumentation-openai package auto-instruments OpenAI SDK calls without code changes. Spans are exported asynchronously so they do not block the main request path.
The example below instruments a ShopMax India product Q&A function with OTel spans, exporting to the console. It creates a parent span for the full request and child spans for the embedding lookup and LLM generation steps, recording token counts and latency as span attributes.
It gives the following output,
Q: What is the price of Samsung 65 QLED?
A: The Samsung 65 QLED is priced at Rs 85000 and is available at the Mumbai warehouse.
Q: Does OnePlus 11 support 5G?
A: Yes, the OnePlus 11 5G supports 5G connectivity and is available at the Bangalore warehouse.
[Console exporter prints spans with attributes:]
{
"name": "llm-generation",
"attributes": {
"llm.model": "gpt-4o-mini",
"llm.prompt_tokens": 68,
"llm.completion_tokens": 24,
"llm.latency_ms": 843
}
}
In production, replace ConsoleSpanExporter with OTLPSpanExporter pointing to your collector endpoint: OTLPSpanExporter(endpoint="http://otel-collector:4317"). Use the opentelemetry-instrumentation-openai package to auto-instrument all OpenAI calls without wrapping each one manually. For ShopMax India, deploy an OTel Collector sidecar that fans out traces to both Jaeger (for developer debugging) and a long-term store like Grafana Tempo (for SLA analysis). Add baggage propagation so the trace ID from the user-facing API request flows through to the LLM span, enabling end-to-end correlation across all microservices.
|
|