Skip to main content

Overview

Kong AI Gateway (v3.13+) is an API gateway with built-in AI proxy and OpenTelemetry support. Fiddler integrates with Kong at the gateway layer via Kong’s opentelemetry plugin, giving you full LLM observability — prompts, responses, token usage, latency — without adding any SDK to your application code.
CapabilityNotes
Zero instrumentationPoint your app at Kong instead of api.openai.com — no code changes needed
LLM span tracingToken counts, model name, latency, and content (with log_payloads: true)
Multi-provider supportOpenAI, Anthropic, Cohere, Azure OpenAI, Google Gemini, and more via the ai-proxy plugin
Direct OTLP exportKong exports traces directly to Fiddler over HTTPS with auth headers

Architecture

Kong exposes an OpenAI-compatible endpoint at /openai. Your application requires no SDK — it just calls Kong instead of the provider directly.

Prerequisites

  • Fiddler account with a GenAI application already created
  • A running Kong Gateway v3.13 or later instance (Gen AI OTel attributes require 3.13)
  • A valid LLM provider API key (e.g. OPENAI_API_KEY)
  • Your Fiddler API key (found under organizational settings) and application UUID (found under application settings)

Quick Start

Step 1 — Create the Kong configuration file

Save the following as kong_fiddler_config.yaml. The ${...} values are placeholders — you’ll replace them with your actual values in Step 2. Kong does not read environment variables from its config file, so the real values must be written into the file.
_format_version: "3.0"

services:
  - name: openai-service
    host: api.openai.com
    port: 443
    protocol: https
    routes:
      - name: openai-chat-route
        paths:
          - /openai
        strip_path: true

plugins:
  - name: ai-proxy
    service: openai-service
    config:
      route_type: llm/v1/chat
      auth:
        header_name: Authorization
        header_value: "Bearer ${OPENAI_API_KEY}"
      model:
        provider: openai
        name: gpt-4o-mini
        options:
          max_tokens: 512
          temperature: 0.7
      # Set log_payloads: true to include prompt and completion text in OTel spans.
      # WARNING: payloads may contain PII. Disable in production unless needed.
      logging:
        log_statistics: true
        log_payloads: true

  # Session grouping: read the X-Fiddler-Conversation-Id request header and stamp
  # it as gen_ai.conversation.id on the request's OTel spans, so multi-turn calls
  # sharing one conversation id group into a single Fiddler Session.
  # See the "Session Grouping" section below for how this works.
  - name: pre-function
    config:
      access:
        - |
          local conv_id = kong.request.get_header("X-Fiddler-Conversation-Id")
          if conv_id and conv_id ~= "" then
            ngx.ctx.fiddler_conversation_id = conv_id
          end
      header_filter:
        - |
          local conv_id = ngx.ctx.fiddler_conversation_id
          if conv_id then
            local spans = ngx.ctx.KONG_SPANS
            if spans then
              for _, span in ipairs(spans) do
                span:set_attribute("gen_ai.conversation.id", conv_id)
              end
            end
          end

  - name: opentelemetry
    config:
      traces_endpoint: "${FIDDLER_URL}/v1/traces"
      headers:
        Authorization: "Bearer ${FIDDLER_API_KEY}"
        fiddler-application-id: "${FIDDLER_APP_ID}"
      resource_attributes:
        service.name: kong
        application.id: "${FIDDLER_APP_ID}"
      propagation:
        default_format: w3c
service.name: kong in resource_attributes is required — Fiddler uses this value to recognize and correctly process Kong spans. application.id is also required; spans without it are silently dropped.

Step 2 — Replace the placeholders with your values

Kong does not read environment variables from its declarative config, so open kong_fiddler_config.yaml and replace each ${...} placeholder with your actual value:
PlaceholderReplace with
${OPENAI_API_KEY}Your OpenAI API key
${FIDDLER_URL}Your Fiddler instance URL (e.g. https://your-instance.fiddler.ai)
${FIDDLER_API_KEY}Your Fiddler API key
${FIDDLER_APP_ID}Your Fiddler application UUID
The file now contains secrets — do not commit it. Apply it to your Kong instance the way you already manage Kong configuration — a DB-less declarative file, decK, the Admin API, or your Helm chart’s config.
If you already have a Kong declarative config, you do not need to replace your existing file. Copy just the three plugin entries (ai-proxy, pre-function, opentelemetry) and add them under your existing plugins: block. The services: and routes: blocks in the example above are only needed if you do not already have an OpenAI route configured.
If any ${...} placeholder is left unreplaced, Kong fails to start with errors like 'traces_endpoint': missing host in url — Kong’s declarative loader does not interpolate environment variables.

Step 3 — Enable tracing on Kong

The opentelemetry plugin only emits spans if Kong’s tracing is enabled at the process level. These three settings cannot be set in the declarative config — set them wherever your Kong reads its configuration (kong.conf, KONG_* environment variables, or your Helm chart’s env: values):
Setting (kong.conf)Environment variableValue
tracing_instrumentationsKONG_TRACING_INSTRUMENTATIONSall
tracing_sampling_rateKONG_TRACING_SAMPLING_RATE1.0
untrusted_luaKONG_UNTRUSTED_LUAon
tracing_instrumentations and tracing_sampling_rate are what make Kong produce OTel spans at all — without them no spans are emitted regardless of the opentelemetry plugin config. untrusted_lua = on is required for the pre-function session-grouping plugin to run its inline Lua.

Step 4 — Point your application at Kong

import os
import uuid
from openai import OpenAI

KONG_URL = os.getenv("KONG_URL", "http://localhost:8000")

# api_key is not forwarded to OpenAI — Kong handles auth via ai-proxy.
client = OpenAI(base_url=f"{KONG_URL}/openai", api_key="kong-managed")

# Generate one conversation UUID per session and send it on every LLM call
# so Fiddler groups them into a single Session.
session_id = str(uuid.uuid4())

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    extra_headers={"X-Fiddler-Conversation-Id": session_id},
)
print(response.choices[0].message.content)
Traces appear in Fiddler automatically — no SDK import, no callback registration, no changes to your application logic. To override the default Kong URL, set KONG_URL (defaults to http://localhost:8000):
export KONG_URL="http://my-kong-host:8000"

Step 5 — Verify traces are arriving

Open the Fiddler UI and navigate to your application’s Trace Explorer. You should see the trace within a few seconds of making your first completion call.

Span Type Mapping

Fiddler classifies Kong spans based on the span name and gen_ai.operation.name. Kong 3.13 names LLM spans "{operation} {model}" (e.g. "chat gpt-4o-mini"). All infrastructure spans start with "kong".
Kong spanFiddler treatment
"chat {model}", "text_completion {model}", "generate_content {model}"llm (forwarded)
kong (root HTTP span)dropped (infrastructure)
kong.routerdropped (infrastructure)
kong.access.plugin.ai-proxydropped (infrastructure)
kong.dns, kong.balancerdropped (infrastructure)
Any other span starting with "kong"dropped (infrastructure)
Kong emits a span hierarchy per request: a root HTTP span (kong), routing and plugin spans (all starting with kong.), and the LLM Gen AI span (named "{operation} {model}"). Only the LLM span is forwarded to Fiddler; Kong’s infrastructure spans are dropped (matching AgentGateway’s LLM-only behaviour). The forwarded LLM span is re-parented to the trace root so it is not flagged as an orphan, and it carries gen_ai.conversation.id, so multi-turn calls sharing one conversation id group under a single Session.

Attribute Mapping

Kong follows the OTel Gen AI semantic conventions. Fiddler’s mapper normalises these automatically:
Kong attributeFiddler treatment
gen_ai.provider.namePassed through unchanged (e.g. "openai", "anthropic")
gen_ai.request.model, gen_ai.response.modelPassed through unchanged
gen_ai.usage.input_tokensMapped to fiddler.span.system.gen_ai.usage.input_tokens
gen_ai.usage.output_tokensMapped to fiddler.span.system.gen_ai.usage.output_tokens
gen_ai.usage.total_tokensComputed as input + output when absent
gen_ai.agent.nameDefaulted to <UNKNOWN_AGENT> when absent
gen_ai.conversation.idDefaulted to trace_id.hex() when absent
gen_ai.input.messagesParsed into gen_ai.llm.input.system (first system message) and gen_ai.llm.input.user (last user message); the latter surfaces as the Input column (requires log_payloads: true on ai-proxy)
gen_ai.output.messagesParsed into gen_ai.llm.output (assistant message); surfaces as the Output column (requires log_payloads: true on ai-proxy)

Session Grouping

Each Kong request is its own trace, so without a shared gen_ai.conversation.id Fiddler falls back to trace_id.hex() and every LLM call shows up as a separate Session. To group a multi-turn conversation into one Session, the same gen_ai.conversation.id must be set on each call’s LLM span. Kong 3.13 has no native conversation-id field — it only supports W3C traceparent for distributed tracing. The Quick Start config above solves this with the built-in pre-function plugin (this is the approach Kong Support recommends for adding a custom attribute from a request header):
  • In the access phase it reads the X-Fiddler-Conversation-Id request header and stashes it in the per-request ngx.ctx.
  • In the header_filter phase it stamps that value as gen_ai.conversation.id on the request’s OTel spans, before the opentelemetry plugin exports them in the log phase.
Because only the LLM span is forwarded to Fiddler (Kong’s infrastructure spans are dropped), stamping in header_filter is sufficient: the LLM span already exists at that point, so it always receives the conversation id. There are no later-created infrastructure spans to worry about — they are discarded by Fiddler’s Kong mapper before reaching the UI. Your application just sends the same header value on every call in a conversation:
import uuid
conversation_id = str(uuid.uuid4())  # one per conversation

client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "..."}],
    extra_headers={"X-Fiddler-Conversation-Id": conversation_id},
)
Why iterate ngx.ctx.KONG_SPANS instead of kong.tracing.active_span()? The documented kong.tracing.active_span() returns the root (kong) span, which Fiddler drops as infrastructure. Fiddler keeps only Kong’s Gen AI/LLM span, so the attribute must land on that span — iterating every span guarantees it does, regardless of which span is the LLM one. This approach is verified working on Kong Gateway 3.13 (the version pinned above). ngx.ctx.KONG_SPANS is an internal Kong field rather than a stable public API, so re-verify session grouping after upgrading Kong. The pre-function plugin also requires KONG_UNTRUSTED_LUA=on and a sampling rate of 1.0 (so spans always exist when the plugin runs).
Alternative — application-side root span (no Kong plugin). If you already instrument your app with the OpenTelemetry SDK, open a parent span per conversation, tag it, and let Kong nest its spans under your trace via the propagated traceparent. This is the pattern in Kong’s Voice AI observability cookbook. It avoids the pre-function plugin but requires OTel SDK code in your application.

Troubleshooting

Kong fails to start (missing host in url or similar) This means the config still contains ${...} placeholders. Kong does not read environment variables — open kong_fiddler_config.yaml and replace every ${...} with your actual value (see Step 2). Traces not appearing in Fiddler Verify kong_fiddler_config.yaml has no ${...} placeholders left (every value filled in):
grep '\${' kong_fiddler_config.yaml   # prints nothing when fully filled in
Also confirm Kong is emitting spans at all — check your Kong logs for OpenTelemetry export activity (for example, grep -i otel over your Kong proxy/error logs). Both application.id (OTel resource attribute) and fiddler-application-id (HTTP header on the export request) are required. If either is missing or does not match a valid Fiddler application UUID, spans are silently dropped. Prompt and response content not showing log_payloads: true is required in the ai-proxy plugin logging config. Without it, gen_ai.input.messages and gen_ai.output.messages are not captured by Kong, so Fiddler will show token counts and model info but no text content. Spans not being emitted by Kong Confirm tracing is enabled at the Kong process level (see Step 3): tracing_instrumentations=all and tracing_sampling_rate=1.0, set as kong.conf settings or KONG_* environment variables. These cannot be configured via the declarative config file. Span type showing as Unknown Fiddler routes spans to the Kong mapper when service.name == "kong" on the OTel resource. Verify the resource_attributes block in the opentelemetry plugin config has service.name: kong (exact string match, case-sensitive).

Known Limitations

LimitationDetails
Content requires payload loggingPrompt and completion text are only captured when log_payloads: true is set on the ai-proxy plugin. This may expose PII — disable in production if needed.
Session grouping relies on a pre-function pluginKong has no native conversation-id field, so the Quick Start config uses a pre-function plugin to map the X-Fiddler-Conversation-Id header onto spans (requires KONG_UNTRUSTED_LUA=on). It reads spans from the internal ngx.ctx.KONG_SPANS — verified working on Kong 3.13, but re-verify after Kong upgrades since this is not a stable public API. See Session Grouping.
Kong Gateway 3.13+ requiredGen AI semantic convention attributes (gen_ai.*) are only emitted by Kong 3.13+. Earlier versions emit only HTTP-level spans.