ML Metrics Reference

Fiddler provides 35 built-in metrics for monitoring ML models in production. These metrics cover model performance, data drift, data integrity, traffic, and statistics. You can also define custom metrics using the Fiddler Query Language.

circle-info

For LLM and GenAI application metrics, see the LLM Observability Metrics Reference.

Performance metrics

Performance metrics measure how well a model performs on its task. The available metrics depend on the model task type. For more details on performance monitoring workflows, see Performance Tracking.

Binary classification

Metric
API ID
Score Range
Description

Accuracy

accuracy

0 -- 1

(TP + TN) / (TP + TN + FP + FN)

Log Loss

log_loss

0 -- infinity

Measures the difference between the predicted probability distribution and the true distribution

Precision

precision

0 -- 1

TP / (TP + FP). Requires a decision threshold.

Recall / True Positive Rate

recall

0 -- 1

TP / (TP + FN). Requires a decision threshold.

F1 Score

f1_score

0 -- 1

2 * (Precision * Recall) / (Precision + Recall). Requires a decision threshold.

False Positive Rate

fpr

0 -- 1

FP / (FP + TN). Requires a decision threshold.

AUC

auc

0 -- 1

Area Under the ROC Curve (histogram-based calculation). See also AUROC.

AUROC

auroc

0 -- 1

Area Under the Receiver Operating Characteristic curve, plotting true positive rate against false positive rate

Expected Calibration Error

expected_calibration_error

0 -- 1

Measures the difference between predicted probabilities and empirical probabilities

Geometric Mean

geometric_mean

0 -- 1

Square root of (Precision * Recall). Requires a decision threshold.

Calibrated Threshold

calibrated_threshold

0 -- 1

A threshold that balances precision and recall at a particular operating point

Data Count

data_count

0 -- infinity

The number of events where target and output are both not NULL. Used as the denominator for accuracy calculations.

Multi-class classification

Metric
API ID
Score Range
Description

Accuracy

accuracy

0 -- 1

(Number of correctly classified samples) / Data Count

Log Loss

log_loss

0 -- infinity

Measures the difference between the predicted probability distribution and the true distribution, on a logarithmic scale

Log Loss Count

log_loss_count

0 -- infinity

Count of events used in the Log Loss calculation

Regression

Metric
API ID
Score Range
Description

Mean Absolute Error (MAE)

mae

0 -- infinity

Average of the absolute differences between predicted and true values

Mean Squared Error (MSE)

mse

0 -- infinity

Average of the squared differences between predicted and true values

Mean Absolute Percentage Error (MAPE)

mape

0 -- infinity

Average of the absolute percentage differences between predicted and true values

Weighted Mean Absolute Percentage Error (WMAPE)

wmape

0 -- infinity

Weighted average of the absolute percentage differences between predicted and true values

R-squared (R²)

r2

-infinity -- 1

Proportion of variance in the dependent variable explained by the independent variables

Ranking

Metric
API ID
Score Range
Description

Mean Average Precision (MAP)

map

0 -- 1

Average precision of relevant items in the top-k results. For binary relevance ranking only. Supports configurable top_k.

Normalized Discounted Cumulative Gain (NDCG)

ndcg_mean

0 -- 1

Quality of the ranking by discounting relevance scores at lower ranks. Supports configurable top_k.

Query Count

query_count

0 -- infinity

Number of ranking queries in the time period

Drift metrics

Drift metrics measure distributional changes between your baseline dataset and production data. High drift can indicate data pipeline issues or genuine shifts in the data distribution. Both metrics require a baseline dataset. For more details, see Data Drift.

Metric
API ID
Score Range
Description

Jensen-Shannon Distance (JSD)

jsd

0 -- 1

Distance between the baseline distribution and the production distribution for a given field

Population Stability Index (PSI)

psi

0 -- infinity

Drift metric based on multinomial classification of a variable into bins, comparing baseline and production distributions

circle-info

The drift analytics table also provides Feature Impact, Feature Drift, and Prediction Drift Impact as derived values to help identify which features contribute most to prediction drift.

Data integrity metrics

Data integrity metrics detect violations in production data compared to the schema established during model onboarding. Fiddler tracks three violation types: missing values, type mismatches, and range violations. Both raw counts and percentages are available. For more details, see Data Integrity.

Count-based

Metric
API ID
Description

Any Violation

any_violation_count

Count of any data integrity violation across all features

Missing Value Violation

null_violation_count

Count of missing value violations across all features

Range Violation

range_violation_count

Count of range violations across all features

Type Violation

type_violation_count

Count of data type violations across all features

Percentage-based

Metric
API ID
Description

% Any Violation

any_violation_percentage

Percentage of events with any data integrity violation

% Missing Value Violation

null_violation_percentage

Percentage of events with missing value violations

% Range Violation

range_violation_percentage

Percentage of events with range violations

% Type Violation

type_violation_percentage

Percentage of events with data type violations

Traffic metrics

Traffic metrics provide visibility into the operational health of your model service. For more details, see Traffic.

Metric
API ID
Description

Traffic

traffic

Volume of inference requests received by the model over time

Statistics metrics

Statistics metrics provide basic aggregations over columns. These are useful for monitoring custom metadata fields over time. For more details, see Statistics.

Metric
API ID
Applies To
Description

Average

average

Numeric columns

Arithmetic mean of a numeric column

Sum

sum

Numeric columns

Sum of a numeric column

Frequency

frequency

Categorical / Boolean columns

Count of occurrences for each value

Custom metrics

In addition to the built-in metrics above, you can define custom metrics using the Fiddler Query Language (FQL). Custom metrics support aggregations, operators, and metric functions to create business-specific KPIs.

For details on creating and managing custom metrics, see:

Last updated

Was this helpful?