Skip to content

Trace

This guide describes the current state of trace and how to use them.

Requirements

You would need to enable the trace feature with following helm option:

inferenceExtension:
  tracing:
    enabled: true
    otelExporterEndpoint: "http://localhost:4317"
    sampling:
      sampler: "parentbased_traceidratio"
      samplerArg: "0.1"
  • otelExporterEndpoint: Points to your OpenTelemetry collector endpoint.
  • sampler: Currently, only parentbased_traceidratio is supported. This sampler respects the parent span's sampling decision and applies the configured ratio for root spans.
  • samplerArg: Base sampling rate for new traces, range [0.0, 1.0]. For example, "0.1" enables 10% sampling.

Span Coverage

Currently, the inference gateway covers the entry point of the external processing request.

  • Tracer Name: gateway-api-inference-extension
  • Span Name: gateway.request

This span is the root span the entire lifecycle of an external processing request from Envoy, including header and body processing, scheduling decisions, and response handling.

Attributes

Span Attributes

Currently, the gateway.request span does not include custom attributes. It primarily serves as a container for the request's execution time and provides context for child spans in downstream services.

Context Propagation

The inference gateway supports distributed tracing by propagating the trace context to downstream services (e.g., model servers).