ML Metrics Reference
Fiddler provides 35 built-in metrics for monitoring ML models in production. These metrics cover model performance, data drift, data integrity, traffic, and statistics. You can also define custom metrics using the Fiddler Query Language.
For LLM and GenAI application metrics, see the LLM Observability Metrics Reference.
Performance metrics
Performance metrics measure how well a model performs on its task. The available metrics depend on the model task type. For more details on performance monitoring workflows, see Performance Tracking.
Binary classification
Accuracy
accuracy
0 -- 1
(TP + TN) / (TP + TN + FP + FN)
Log Loss
log_loss
0 -- infinity
Measures the difference between the predicted probability distribution and the true distribution
Precision
precision
0 -- 1
TP / (TP + FP). Requires a decision threshold.
Recall / True Positive Rate
recall
0 -- 1
TP / (TP + FN). Requires a decision threshold.
F1 Score
f1_score
0 -- 1
2 * (Precision * Recall) / (Precision + Recall). Requires a decision threshold.
False Positive Rate
fpr
0 -- 1
FP / (FP + TN). Requires a decision threshold.
AUC
auc
0 -- 1
Area Under the ROC Curve (histogram-based calculation). See also AUROC.
AUROC
auroc
0 -- 1
Area Under the Receiver Operating Characteristic curve, plotting true positive rate against false positive rate
Expected Calibration Error
expected_calibration_error
0 -- 1
Measures the difference between predicted probabilities and empirical probabilities
Geometric Mean
geometric_mean
0 -- 1
Square root of (Precision * Recall). Requires a decision threshold.
Calibrated Threshold
calibrated_threshold
0 -- 1
A threshold that balances precision and recall at a particular operating point
Data Count
data_count
0 -- infinity
The number of events where target and output are both not NULL. Used as the denominator for accuracy calculations.
Multi-class classification
Accuracy
accuracy
0 -- 1
(Number of correctly classified samples) / Data Count
Log Loss
log_loss
0 -- infinity
Measures the difference between the predicted probability distribution and the true distribution, on a logarithmic scale
Log Loss Count
log_loss_count
0 -- infinity
Count of events used in the Log Loss calculation
Regression
Mean Absolute Error (MAE)
mae
0 -- infinity
Average of the absolute differences between predicted and true values
Mean Squared Error (MSE)
mse
0 -- infinity
Average of the squared differences between predicted and true values
Mean Absolute Percentage Error (MAPE)
mape
0 -- infinity
Average of the absolute percentage differences between predicted and true values
Weighted Mean Absolute Percentage Error (WMAPE)
wmape
0 -- infinity
Weighted average of the absolute percentage differences between predicted and true values
R-squared (R²)
r2
-infinity -- 1
Proportion of variance in the dependent variable explained by the independent variables
Ranking
Mean Average Precision (MAP)
map
0 -- 1
Average precision of relevant items in the top-k results. For binary relevance ranking only. Supports configurable top_k.
Normalized Discounted Cumulative Gain (NDCG)
ndcg_mean
0 -- 1
Quality of the ranking by discounting relevance scores at lower ranks. Supports configurable top_k.
Query Count
query_count
0 -- infinity
Number of ranking queries in the time period
Drift metrics
Drift metrics measure distributional changes between your baseline dataset and production data. High drift can indicate data pipeline issues or genuine shifts in the data distribution. Both metrics require a baseline dataset. For more details, see Data Drift.
Jensen-Shannon Distance (JSD)
jsd
0 -- 1
Distance between the baseline distribution and the production distribution for a given field
Population Stability Index (PSI)
psi
0 -- infinity
Drift metric based on multinomial classification of a variable into bins, comparing baseline and production distributions
The drift analytics table also provides Feature Impact, Feature Drift, and Prediction Drift Impact as derived values to help identify which features contribute most to prediction drift.
Data integrity metrics
Data integrity metrics detect violations in production data compared to the schema established during model onboarding. Fiddler tracks three violation types: missing values, type mismatches, and range violations. Both raw counts and percentages are available. For more details, see Data Integrity.
Count-based
Any Violation
any_violation_count
Count of any data integrity violation across all features
Missing Value Violation
null_violation_count
Count of missing value violations across all features
Range Violation
range_violation_count
Count of range violations across all features
Type Violation
type_violation_count
Count of data type violations across all features
Percentage-based
% Any Violation
any_violation_percentage
Percentage of events with any data integrity violation
% Missing Value Violation
null_violation_percentage
Percentage of events with missing value violations
% Range Violation
range_violation_percentage
Percentage of events with range violations
% Type Violation
type_violation_percentage
Percentage of events with data type violations
Traffic metrics
Traffic metrics provide visibility into the operational health of your model service. For more details, see Traffic.
Traffic
traffic
Volume of inference requests received by the model over time
Statistics metrics
Statistics metrics provide basic aggregations over columns. These are useful for monitoring custom metadata fields over time. For more details, see Statistics.
Average
average
Numeric columns
Arithmetic mean of a numeric column
Sum
sum
Numeric columns
Sum of a numeric column
Frequency
frequency
Categorical / Boolean columns
Count of occurrences for each value
Custom metrics
In addition to the built-in metrics above, you can define custom metrics using the Fiddler Query Language (FQL). Custom metrics support aggregations, operators, and metric functions to create business-specific KPIs.
For details on creating and managing custom metrics, see:
Related resources
LLM Observability Metrics Reference — Enrichments for LLM application monitoring
Performance Tracking — Performance monitoring workflows
Data Drift — Drift monitoring and analysis
Data Integrity — Data quality monitoring
Custom Metrics Guide — Creating custom metrics with FQL
Last updated
Was this helpful?