Custom Metrics for Agentic Applications

Overview

Custom metrics let you define measurements that align precisely with your agentic application’s requirements. Whether tracking business KPIs, aggregating quality scores from enrichments, or computing cost and latency signals, custom metrics let you tailor observability to your specific needs. Once defined, they are available in charts and dashboards. Custom metrics for agentic applications are defined using Fiddler Query Language (FQL) and reference span attributes captured from your application’s OpenTelemetry traces. This differs from ML custom metrics, which reference model schema columns.

Custom metrics can be organization-level (visible across all projects) or project-scoped (visible only within a specific project). See Metric Visibility below.

Metric Visibility: Organization vs Project

When creating a custom metric, you choose whether it is organization-level (global) or project-scoped. This determines who can see it and who can delete it, and cannot be changed after creation.

Organization-level metrics

Created without selecting a project. Visible across all projects in the organization. The metric name is reserved org-wide — no other metric in the organization can share the same name.

Action	Roles
Create	Org Admin, Org Member
View	Org Admin, Org Member
Delete	Org Admin

Project-scoped metrics

Created with a specific project selected. Visible only to users who have access to that project. The same metric name can be reused in a different project as long as the projects don’t overlap.

Action	Roles
Create	Project Admin, Project Writer (on that project)
View	Project Admin, Project Writer, Project Viewer (on that project)
Delete	Project Admin (on that project)

Metric visibility is immutable — an organization-level metric cannot be changed to project-scoped or vice versa after creation.

The `attribute()` Function

The attribute() function is the GenAI-specific FQL primitive for referencing span data. It replaces the column references used in ML custom metrics.

Syntax

attribute('name', type='user', scope='span')
attribute('name', type='user', scope='span', value='category_value')
attribute('name', type='system', scope='span')

Parameter	Required	Description
`name`	Yes	The attribute name as it appears in your trace data (e.g., `gen_ai.usage.input_tokens`)
`scope`	Yes	The attribute scope. Currently only `'span'` is supported.
`type`	Yes	The attribute source: `'user'` for attributes your application sets, or `'system'` for attributes emitted by OpenTelemetry instrumentation (e.g., `gen_ai.usage.input_tokens`).
`value`	No	Filters to spans where the attribute equals this string; returns the value if matched, `null` otherwise. Used for categorical attributes (e.g., `attribute('status', value='error')`). When set, the attribute is always treated as a string.

Type inference

Fiddler infers the attribute type from context — you do not need to declare it explicitly:

When used inside a numeric aggregate like sum() or average(), the attribute is treated as a number.
When used with string functions like length() or match(), the attribute is treated as a string.
When the value keyword is provided, the attribute is always treated as a string.

Adding a Custom Metric

Navigate to the Custom Metrics section in the Fiddler UI.
Click Add Custom Metric.
Enter a Metric name, an optional Description, and the Metric definition.
Optionally select a project to scope the metric to. If no project is selected, the metric is created as an organization-level metric visible across all projects.
Click Create Metric.

Custom metric creation form in the Fiddler UI

Using Custom Metrics in Charts

After saving a custom metric, you can use it in chart definitions:

Open or create a chart in the Fiddler UI.
Set Metric Type to Custom Metric.
Select your custom metric from the list.

Deleting Custom Metrics

To delete a custom metric, click the trash icon next to the metric in the Custom Metrics tab. Deletion runs as a background job that automatically:

Removes the metric from any charts that reference it
Deletes charts that have no remaining metrics after cleanup
Updates dashboard layouts to remove deleted charts
Deletes dashboards that become empty as a result

Examples

Custom metrics must return either an aggregate (produced by aggregate functions) or a combination of aggregates. See the FQL reference for the full list of supported operators and functions.

Average input token usage

Track the mean number of input tokens consumed per span to monitor LLM cost drivers over time.

average(attribute('gen_ai.usage.input_tokens', type='system', scope='span'))

→ Returns a Number (e.g., 312.4)

Premium user ratio

Measure the fraction of spans attributed to premium-tier users by filtering on a categorical attribute.

count(attribute('tier', type='user', scope='span', value='premium')) / count(attribute('tier', type='user', scope='span'))

→ Returns a Number between 0 and 1 (e.g., 0.34)

P95 response latency

Use the quantile() function to track the 95th-percentile response time. This is more robust than averages for catching tail latency issues.

quantile(attribute('response_time_ms', type='user', scope='span'), level=0.95)

→ Returns a Number in the same unit as the attribute (e.g., 1420.0 ms)

Conditional cost (weighted by outcome)

Apply different weights to successful and failed spans to surface the true cost impact of errors. The if(condition, true_value, false_value) function evaluates the condition per span and returns one of two values.

average(if(attribute('status', type='user', scope='span') == 'error', attribute('cost', type='user', scope='span') * 2, attribute('cost', type='user', scope='span')))

→ Returns a Number (e.g., 0.0042)

Latency range

Track the spread of response times across spans in a time window. A widening range can signal instability or the emergence of slow outlier requests.

max(attribute('response_time_ms', type='user', scope='span')) - min(attribute('response_time_ms', type='user', scope='span'))

→ Returns a Number in the same unit as the attribute (e.g., 3850.0 ms)

Minimum token usage

Find the smallest input token count across all spans in a window. Useful for detecting unusually short requests that may indicate truncated inputs or misconfigured clients.

min(attribute('gen_ai.usage.input_tokens', type='system', scope='span'))

→ Returns a Number (e.g., 12)

Null-safe cost with markup

Apply a price adjustment to spans that have a cost attribute, while preserving null for spans where cost data is absent — avoiding accidental zero-inflation of the average.

average(if(is_null(attribute('cost', type='user', scope='span')), null, attribute('cost', type='user', scope='span') * 1.15))

→ Returns a Number representing the average marked-up cost, excluding null-cost spans (e.g., 0.0048)

Use is_null() to test whether an attribute is absent. The null keyword is for use as a return value in expressions (e.g., if(condition, value, null)) to propagate missing data explicitly.

Fiddler Query Language (FQL) — full reference for operators, aggregate functions, and expression syntax
Custom Metrics for ML Models — custom metrics using model schema columns instead of span attributes
Agentic Observability — overview of dashboards, metrics, and integrations for agentic applications
Custom Metrics glossary entry

Getting Started

Concepts

Evaluate & Test

Reference

Custom Metrics for Agentic Applications

Overview

Metric Visibility: Organization vs Project

Organization-level metrics

Project-scoped metrics

The `attribute()` Function

Syntax

Type inference

Adding a Custom Metric

Using Custom Metrics in Charts

Deleting Custom Metrics

Examples

Average input token usage

Premium user ratio

P95 response latency

Conditional cost (weighted by outcome)

Latency range

Minimum token usage

Null-safe cost with markup

​Overview

​Metric Visibility: Organization vs Project

​Organization-level metrics

​Project-scoped metrics

​The attribute() Function

​Syntax

​Type inference

​Adding a Custom Metric

​Using Custom Metrics in Charts

​Deleting Custom Metrics

​Examples

​Average input token usage

​Premium user ratio

​P95 response latency

​Conditional cost (weighted by outcome)

​Latency range

​Minimum token usage

​Null-safe cost with markup

​Related Resources

Overview

Metric Visibility: Organization vs Project

Organization-level metrics

Project-scoped metrics

The `attribute()` Function

Syntax

Type inference

Adding a Custom Metric

Using Custom Metrics in Charts

Deleting Custom Metrics

Examples

Average input token usage

Premium user ratio

P95 response latency

Conditional cost (weighted by outcome)

Latency range

Minimum token usage

Null-safe cost with markup

Related Resources