Overview
Kong AI Gateway (v3.13+) is an API gateway with built-in AI proxy and OpenTelemetry support. Fiddler integrates with Kong at the gateway layer via Kong’s opentelemetry plugin, giving you full LLM observability — prompts, responses, token usage, latency — without adding any SDK to your application code.
| Capability | Notes |
|---|
| Zero instrumentation | Point your app at Kong instead of api.openai.com — no code changes needed |
| LLM span tracing | Token counts, model name, latency, and content (with log_payloads: true) |
| Multi-provider support | OpenAI, Anthropic, Cohere, Azure OpenAI, Google Gemini, and more via the ai-proxy plugin |
| Direct OTLP export | Kong exports traces directly to Fiddler over HTTPS with auth headers |
Architecture
Kong exposes an OpenAI-compatible endpoint at /openai. Your application requires no SDK — it just calls Kong instead of the provider directly.
Prerequisites
- Fiddler account with a GenAI application already created
- A running Kong Gateway v3.13 or later instance (Gen AI OTel attributes require 3.13)
- A valid LLM provider API key (e.g.
OPENAI_API_KEY)
- Your Fiddler API key (found under organizational settings) and application UUID (found under application settings)
Quick Start
Step 1 — Create the Kong configuration file
Save the following as kong_fiddler_config.yaml. The ${...} values are placeholders — you’ll replace them with your actual values in Step 2. Kong does not read environment variables from its config file, so the real values must be written into the file.
_format_version: "3.0"
services:
- name: openai-service
host: api.openai.com
port: 443
protocol: https
routes:
- name: openai-chat-route
paths:
- /openai
strip_path: true
plugins:
- name: ai-proxy
service: openai-service
config:
route_type: llm/v1/chat
auth:
header_name: Authorization
header_value: "Bearer ${OPENAI_API_KEY}"
model:
provider: openai
name: gpt-4o-mini
options:
max_tokens: 512
temperature: 0.7
# Set log_payloads: true to include prompt and completion text in OTel spans.
# WARNING: payloads may contain PII. Disable in production unless needed.
logging:
log_statistics: true
log_payloads: true
# Session grouping: read the X-Fiddler-Conversation-Id request header and stamp
# it as gen_ai.conversation.id on the request's OTel spans, so multi-turn calls
# sharing one conversation id group into a single Fiddler Session.
# See the "Session Grouping" section below for how this works.
- name: pre-function
config:
access:
- |
local conv_id = kong.request.get_header("X-Fiddler-Conversation-Id")
if conv_id and conv_id ~= "" then
ngx.ctx.fiddler_conversation_id = conv_id
end
header_filter:
- |
local conv_id = ngx.ctx.fiddler_conversation_id
if conv_id then
local spans = ngx.ctx.KONG_SPANS
if spans then
for _, span in ipairs(spans) do
span:set_attribute("gen_ai.conversation.id", conv_id)
end
end
end
- name: opentelemetry
config:
traces_endpoint: "${FIDDLER_URL}/v1/traces"
headers:
Authorization: "Bearer ${FIDDLER_API_KEY}"
fiddler-application-id: "${FIDDLER_APP_ID}"
resource_attributes:
service.name: kong
application.id: "${FIDDLER_APP_ID}"
propagation:
default_format: w3c
service.name: kong in resource_attributes is required — Fiddler uses this value to recognize and correctly process Kong spans. application.id is also required; spans without it are silently dropped.
Step 2 — Replace the placeholders with your values
Kong does not read environment variables from its declarative config, so open kong_fiddler_config.yaml and replace each ${...} placeholder with your actual value:
| Placeholder | Replace with |
|---|
${OPENAI_API_KEY} | Your OpenAI API key |
${FIDDLER_URL} | Your Fiddler instance URL (e.g. https://your-instance.fiddler.ai) |
${FIDDLER_API_KEY} | Your Fiddler API key |
${FIDDLER_APP_ID} | Your Fiddler application UUID |
The file now contains secrets — do not commit it. Apply it to your Kong instance the way you already manage Kong configuration — a DB-less declarative file, decK, the Admin API, or your Helm chart’s config.
If you already have a Kong declarative config, you do not need to replace your existing file. Copy just the three plugin entries (ai-proxy, pre-function, opentelemetry) and add them under your existing plugins: block. The services: and routes: blocks in the example above are only needed if you do not already have an OpenAI route configured.
If any ${...} placeholder is left unreplaced, Kong fails to start with errors like 'traces_endpoint': missing host in url — Kong’s declarative loader does not interpolate environment variables.
Step 3 — Enable tracing on Kong
The opentelemetry plugin only emits spans if Kong’s tracing is enabled at the process level. These three settings cannot be set in the declarative config — set them wherever your Kong reads its configuration (kong.conf, KONG_* environment variables, or your Helm chart’s env: values):
Setting (kong.conf) | Environment variable | Value |
|---|
tracing_instrumentations | KONG_TRACING_INSTRUMENTATIONS | all |
tracing_sampling_rate | KONG_TRACING_SAMPLING_RATE | 1.0 |
untrusted_lua | KONG_UNTRUSTED_LUA | on |
tracing_instrumentations and tracing_sampling_rate are what make Kong produce OTel spans at all — without them no spans are emitted regardless of the opentelemetry plugin config. untrusted_lua = on is required for the pre-function session-grouping plugin to run its inline Lua.
Step 4 — Point your application at Kong
import os
import uuid
from openai import OpenAI
KONG_URL = os.getenv("KONG_URL", "http://localhost:8000")
# api_key is not forwarded to OpenAI — Kong handles auth via ai-proxy.
client = OpenAI(base_url=f"{KONG_URL}/openai", api_key="kong-managed")
# Generate one conversation UUID per session and send it on every LLM call
# so Fiddler groups them into a single Session.
session_id = str(uuid.uuid4())
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}],
extra_headers={"X-Fiddler-Conversation-Id": session_id},
)
print(response.choices[0].message.content)
Traces appear in Fiddler automatically — no SDK import, no callback registration, no changes to your application logic.
To override the default Kong URL, set KONG_URL (defaults to http://localhost:8000):
export KONG_URL="http://my-kong-host:8000"
Step 5 — Verify traces are arriving
Open the Fiddler UI and navigate to your application’s Trace Explorer. You should see the trace within a few seconds of making your first completion call.
Span Type Mapping
Fiddler classifies Kong spans based on the span name and gen_ai.operation.name.
Kong 3.13 names LLM spans "{operation} {model}" (e.g. "chat gpt-4o-mini"). All infrastructure spans start with "kong".
| Kong span | Fiddler treatment |
|---|
"chat {model}", "text_completion {model}", "generate_content {model}" | llm (forwarded) |
kong (root HTTP span) | dropped (infrastructure) |
kong.router | dropped (infrastructure) |
kong.access.plugin.ai-proxy | dropped (infrastructure) |
kong.dns, kong.balancer | dropped (infrastructure) |
Any other span starting with "kong" | dropped (infrastructure) |
Kong emits a span hierarchy per request: a root HTTP span (kong), routing and plugin spans (all starting with kong.), and the LLM Gen AI span (named "{operation} {model}"). Only the LLM span is forwarded to Fiddler; Kong’s infrastructure spans are dropped (matching AgentGateway’s LLM-only behaviour). The forwarded LLM span is re-parented to the trace root so it is not flagged as an orphan, and it carries gen_ai.conversation.id, so multi-turn calls sharing one conversation id group under a single Session.
Attribute Mapping
Kong follows the OTel Gen AI semantic conventions. Fiddler’s mapper normalises these automatically:
| Kong attribute | Fiddler treatment |
|---|
gen_ai.provider.name | Passed through unchanged (e.g. "openai", "anthropic") |
gen_ai.request.model, gen_ai.response.model | Passed through unchanged |
gen_ai.usage.input_tokens | Mapped to fiddler.span.system.gen_ai.usage.input_tokens |
gen_ai.usage.output_tokens | Mapped to fiddler.span.system.gen_ai.usage.output_tokens |
gen_ai.usage.total_tokens | Computed as input + output when absent |
gen_ai.agent.name | Defaulted to <UNKNOWN_AGENT> when absent |
gen_ai.conversation.id | Defaulted to trace_id.hex() when absent |
gen_ai.input.messages | Parsed into gen_ai.llm.input.system (first system message) and gen_ai.llm.input.user (last user message); the latter surfaces as the Input column (requires log_payloads: true on ai-proxy) |
gen_ai.output.messages | Parsed into gen_ai.llm.output (assistant message); surfaces as the Output column (requires log_payloads: true on ai-proxy) |
Session Grouping
Each Kong request is its own trace, so without a shared gen_ai.conversation.id Fiddler falls back to trace_id.hex() and every LLM call shows up as a separate Session. To group a multi-turn conversation into one Session, the same gen_ai.conversation.id must be set on each call’s LLM span.
Kong 3.13 has no native conversation-id field — it only supports W3C traceparent for distributed tracing. The Quick Start config above solves this with the built-in pre-function plugin (this is the approach Kong Support recommends for adding a custom attribute from a request header):
- In the
access phase it reads the X-Fiddler-Conversation-Id request header and stashes it in the per-request ngx.ctx.
- In the
header_filter phase it stamps that value as gen_ai.conversation.id on the request’s OTel spans, before the opentelemetry plugin exports them in the log phase.
Because only the LLM span is forwarded to Fiddler (Kong’s infrastructure spans are dropped), stamping in header_filter is sufficient: the LLM span already exists at that point, so it always receives the conversation id. There are no later-created infrastructure spans to worry about — they are discarded by Fiddler’s Kong mapper before reaching the UI.
Your application just sends the same header value on every call in a conversation:
import uuid
conversation_id = str(uuid.uuid4()) # one per conversation
client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "..."}],
extra_headers={"X-Fiddler-Conversation-Id": conversation_id},
)
Why iterate ngx.ctx.KONG_SPANS instead of kong.tracing.active_span()? The documented kong.tracing.active_span() returns the root (kong) span, which Fiddler drops as infrastructure. Fiddler keeps only Kong’s Gen AI/LLM span, so the attribute must land on that span — iterating every span guarantees it does, regardless of which span is the LLM one. This approach is verified working on Kong Gateway 3.13 (the version pinned above). ngx.ctx.KONG_SPANS is an internal Kong field rather than a stable public API, so re-verify session grouping after upgrading Kong. The pre-function plugin also requires KONG_UNTRUSTED_LUA=on and a sampling rate of 1.0 (so spans always exist when the plugin runs).
Alternative — application-side root span (no Kong plugin). If you already instrument your app with the OpenTelemetry SDK, open a parent span per conversation, tag it, and let Kong nest its spans under your trace via the propagated traceparent. This is the pattern in Kong’s Voice AI observability cookbook. It avoids the pre-function plugin but requires OTel SDK code in your application.
Troubleshooting
Kong fails to start (missing host in url or similar)
This means the config still contains ${...} placeholders. Kong does not read environment variables — open kong_fiddler_config.yaml and replace every ${...} with your actual value (see Step 2).
Traces not appearing in Fiddler
Verify kong_fiddler_config.yaml has no ${...} placeholders left (every value filled in):
grep '\${' kong_fiddler_config.yaml # prints nothing when fully filled in
Also confirm Kong is emitting spans at all — check your Kong logs for OpenTelemetry export activity (for example, grep -i otel over your Kong proxy/error logs).
Both application.id (OTel resource attribute) and fiddler-application-id (HTTP header on the export request) are required. If either is missing or does not match a valid Fiddler application UUID, spans are silently dropped.
Prompt and response content not showing
log_payloads: true is required in the ai-proxy plugin logging config. Without it, gen_ai.input.messages and gen_ai.output.messages are not captured by Kong, so Fiddler will show token counts and model info but no text content.
Spans not being emitted by Kong
Confirm tracing is enabled at the Kong process level (see Step 3): tracing_instrumentations=all and tracing_sampling_rate=1.0, set as kong.conf settings or KONG_* environment variables. These cannot be configured via the declarative config file.
Span type showing as Unknown
Fiddler routes spans to the Kong mapper when service.name == "kong" on the OTel resource. Verify the resource_attributes block in the opentelemetry plugin config has service.name: kong (exact string match, case-sensitive).
Known Limitations
| Limitation | Details |
|---|
| Content requires payload logging | Prompt and completion text are only captured when log_payloads: true is set on the ai-proxy plugin. This may expose PII — disable in production if needed. |
Session grouping relies on a pre-function plugin | Kong has no native conversation-id field, so the Quick Start config uses a pre-function plugin to map the X-Fiddler-Conversation-Id header onto spans (requires KONG_UNTRUSTED_LUA=on). It reads spans from the internal ngx.ctx.KONG_SPANS — verified working on Kong 3.13, but re-verify after Kong upgrades since this is not a stable public API. See Session Grouping. |
| Kong Gateway 3.13+ required | Gen AI semantic convention attributes (gen_ai.*) are only emitted by Kong 3.13+. Earlier versions emit only HTTP-level spans. |