In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Generative AI > AI Observability > OpenTelemetry for LLM Applications - Distributed Tracing with Python

OpenTelemetry for LLM Applications - Distributed Tracing with Python

Author: Venkata Sudhakar

ShopMax India's AI platform spans multiple services - a chatbot service in Bangalore, a recommendation engine in Mumbai, and a fraud detection service in Hyderabad. When a customer query is slow, the team needs distributed traces that show how long each service took, not just the final LLM call. OpenTelemetry (OTel) is a vendor-neutral observability standard that instruments Python applications with spans, metrics, and logs that export to any backend - Jaeger, Grafana Tempo, or Datadog. For LLM applications, OTel traces every prompt, completion, embedding call, and retrieval step with standardized semantic attributes.

OTel tracing works by creating a Tracer from a TracerProvider configured with an exporter (OTLP, Jaeger, or console). Each LLM call is wrapped in a span with attributes like llm.model, llm.prompt_tokens, llm.completion_tokens, and llm.latency_ms. Child spans represent sub-operations like vector search or tool calls. The opentelemetry-instrumentation-openai package auto-instruments OpenAI SDK calls without code changes. Spans are exported asynchronously so they do not block the main request path.

The example below instruments a ShopMax India product Q&A function with OTel spans, exporting to the console. It creates a parent span for the full request and child spans for the embedding lookup and LLM generation steps, recording token counts and latency as span attributes.

import time
from openai import OpenAI
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.sdk.resources import Resource

resource = Resource.create({"service.name": "shopmax-product-qa"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("shopmax.ai")

client = OpenAI(api_key="sk-your-key-here")

PRODUCT_DOCS = {
    "Samsung 65 QLED": "4K 120Hz HDR10+ Rs 85000 Mumbai warehouse",
    "OnePlus 11 5G": "Snapdragon 8 Gen2 5G 5000mAh Rs 56999 Bangalore warehouse"
}

def answer_product_query(query):
    with tracer.start_as_current_span("product-qa-request") as root_span:
        root_span.set_attribute("query", query)
        root_span.set_attribute("service", "shopmax-india")

with tracer.start_as_current_span("doc-retrieval") as ret_span:
            t0 = time.time()
            context = " | ".join(PRODUCT_DOCS.values())
            ret_span.set_attribute("docs.count", len(PRODUCT_DOCS))
            ret_span.set_attribute("retrieval.latency_ms", int((time.time() - t0) * 1000))

with tracer.start_as_current_span("llm-generation") as llm_span:
            t1 = time.time()
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": "Answer using this context: " + context},
                    {"role": "user", "content": query}
                ]
            )
            answer = response.choices[0].message.content
            llm_span.set_attribute("llm.model", "gpt-4o-mini")
            llm_span.set_attribute("llm.prompt_tokens", response.usage.prompt_tokens)
            llm_span.set_attribute("llm.completion_tokens", response.usage.completion_tokens)
            llm_span.set_attribute("llm.latency_ms", int((time.time() - t1) * 1000))

root_span.set_attribute("answer.length", len(answer))
        return answer

queries = [
    "What is the price of Samsung 65 QLED?",
    "Does OnePlus 11 support 5G?"
]

for q in queries:
    ans = answer_product_query(q)
    print("Q: " + q)
    print("A: " + ans)
    print()

It gives the following output,

Q: What is the price of Samsung 65 QLED?
A: The Samsung 65 QLED is priced at Rs 85000 and is available at the Mumbai warehouse.

Q: Does OnePlus 11 support 5G?
A: Yes, the OnePlus 11 5G supports 5G connectivity and is available at the Bangalore warehouse.

[Console exporter prints spans with attributes:]
{
  "name": "llm-generation",
  "attributes": {
    "llm.model": "gpt-4o-mini",
    "llm.prompt_tokens": 68,
    "llm.completion_tokens": 24,
    "llm.latency_ms": 843
  }
}

In production, replace ConsoleSpanExporter with OTLPSpanExporter pointing to your collector endpoint: OTLPSpanExporter(endpoint="http://otel-collector:4317"). Use the opentelemetry-instrumentation-openai package to auto-instrument all OpenAI calls without wrapping each one manually. For ShopMax India, deploy an OTel Collector sidecar that fans out traces to both Jaeger (for developer debugging) and a long-term store like Grafana Tempo (for SLA analysis). Add baggage propagation so the trace ID from the user-facing API request flows through to the LLM span, enabling end-to-end correlation across all microservices.

Send your comments, suggestions or queries regarding this site to [email protected].