Custom Metrics

Overview

Custom metrics let you define measurements that align precisely with your agentic application's requirements. Whether tracking business KPIs, aggregating quality scores from enrichments, or computing cost and latency signals, custom metrics let you tailor observability to your specific needs. Once defined, they are available in charts and dashboards.

Custom metrics for agentic applications are defined using Fiddler Query Language (FQL) and reference span attributes captured from your application's OpenTelemetry traces. This differs from ML custom metrics, which reference model schema columns.

circle-info

Custom metrics can be organization-level (visible across all projects) or project-scoped (visible only within a specific project). See Metric Visibility below.

Metric Visibility: Organization vs Project

When creating a custom metric, you choose whether it is organization-level (global) or project-scoped. This determines who can see it and who can delete it, and cannot be changed after creation.

Organization-level metrics

Created without selecting a project. Visible across all projects in the organization. The metric name is reserved org-wide — no other metric in the organization can share the same name.

Action
Roles

Create

Org Admin, Org Member

View

Org Admin, Org Member

Delete

Org Admin

Project-scoped metrics

Created with a specific project selected. Visible only to users who have access to that project. The same metric name can be reused in a different project as long as the projects don't overlap.

Action
Roles

Create

Project Admin, Project Writer (on that project)

View

Project Admin, Project Writer, Project Viewer (on that project)

Delete

Project Admin (on that project)

circle-exclamation

The attribute() Function

The attribute() function is the GenAI-specific FQL primitive for referencing span data. It replaces the column references used in ML custom metrics.

Syntax

Parameter
Required
Description

name

Yes

The attribute name as it appears in your trace data (e.g., gen_ai.usage.input_tokens)

scope

Yes

The attribute scope. Currently only 'span' is supported.

type

Yes

The attribute source: 'user' for attributes your application sets, or 'system' for attributes emitted by OpenTelemetry instrumentation (e.g., gen_ai.usage.input_tokens).

value

No

Filters to spans where the attribute equals this string; returns the value if matched, null otherwise. Used for categorical attributes (e.g., attribute('status', value='error')). When set, the attribute is always treated as a string.

Type inference

Fiddler infers the attribute type from context — you do not need to declare it explicitly:

  • When used inside a numeric aggregate like sum() or average(), the attribute is treated as a number.

  • When used with string functions like length() or match(), the attribute is treated as a string.

  • When the value keyword is provided, the attribute is always treated as a string.

Adding a Custom Metric

  1. Navigate to the Custom Metrics section in the Fiddler UI.

  2. Click Add Custom Metric.

  3. Enter a Metric name, an optional Description, and the Metric definition.

  4. Optionally select a project to scope the metric to. If no project is selected, the metric is created as an organization-level metric visible across all projects.

  5. Click Create Metric.

Custom metric creation form in the Fiddler UI

Using Custom Metrics in Charts

After saving a custom metric, you can use it in chart definitions:

  1. Open or create a chart in the Fiddler UI.

  2. Set Metric Type to Custom Metric.

  3. Select your custom metric from the list.

Deleting Custom Metrics

To delete a custom metric, click the trash icon next to the metric in the Custom Metrics tab. Deletion runs as a background job that automatically:

  • Removes the metric from any charts that reference it

  • Deletes charts that have no remaining metrics after cleanup

  • Updates dashboard layouts to remove deleted charts

  • Deletes dashboards that become empty as a result

Examples

Custom metrics must return either an aggregate (produced by aggregate functions) or a combination of aggregates. See the FQL reference for the full list of supported operators and functions.

Average input token usage

Track the mean number of input tokens consumed per span to monitor LLM cost drivers over time.

→ Returns a Number (e.g., 312.4)

Premium user ratio

Measure the fraction of spans attributed to premium-tier users by filtering on a categorical attribute.

→ Returns a Number between 0 and 1 (e.g., 0.34)

P95 response latency

Use the quantile() function to track the 95th-percentile response time. This is more robust than averages for catching tail latency issues.

→ Returns a Number in the same unit as the attribute (e.g., 1420.0 ms)

Conditional cost (weighted by outcome)

Apply different weights to successful and failed spans to surface the true cost impact of errors. The if(condition, true_value, false_value) function evaluates the condition per span and returns one of two values.

→ Returns a Number (e.g., 0.0042)

Latency range

Track the spread of response times across spans in a time window. A widening range can signal instability or the emergence of slow outlier requests.

→ Returns a Number in the same unit as the attribute (e.g., 3850.0 ms)

Minimum token usage

Find the smallest input token count across all spans in a window. Useful for detecting unusually short requests that may indicate truncated inputs or misconfigured clients.

→ Returns a Number (e.g., 12)

Null-safe cost with markup

Apply a price adjustment to spans that have a cost attribute, while preserving null for spans where cost data is absent — avoiding accidental zero-inflation of the average.

→ Returns a Number representing the average marked-up cost, excluding null-cost spans (e.g., 0.0048)

circle-info

Use is_null() to test whether an attribute is absent. The null keyword is for use as a return value in expressions (e.g., if(condition, value, null)) to propagate missing data explicitly.

Last updated

Was this helpful?