Fiddler provides powerful visualizations to explain the model behavior. These explanations can be queried at an individual prediction level in the Explain tab, at a model level in the Analyze tab or within the monitoring context in the Monitor tab.
Explanations are available in the UI for structured (tabular) and natural language (NLP) models, for both classification and regression. They are also supported via API using the Fiddler Python package. Explanations are available for both production and dataset queries.
Fiddler’s explanations are interactive — you can change feature inputs and immediately view an updated prediction and explanation. We have productized several popular explanation methods to work fast and at scale:
- SHAP and Fiddler, game-theory based methods that work for all models, because they only require the ability to ask a model for predictions.
- Integrated Gradients, which is particularly performant for deep learning models with large input. It requires the model’s prediction to be mathematically differentiable, and a prediction gradient must be made available to Fiddler.
These methods are discussed in more detail in a later section.
For tabular models, Fiddler’s Point Explanation tool shows any given model prediction can be attributed to its individual input features.
The following is an example of an explanation for a model predicting the likelihood of customer churn:
A brief tour of the features above:
- Explanation Method: The explanation method is selected from the dropdown Explaination Type.
- Input Vector: The far left column contains the input vector. Each input can be adjusted.
Prediction: The box in the upper-left shows the model’s prediction for this input vector.
- If the model produces multiple outputs (e.g. probabilities in a multi-class classifier),you can click on the prediction field to select and explain any of the output components. This can be particularly useful when diagnosing misclassified examples.
Feature Attributions: The colored bars on the right represent how the prediction is attributed to the individual feature inputs.
A positive value (blue bar) indicates a feature is responsible for driving the prediction in the positive direction.
- A negative value (red bar) is responsible for driving the prediction in a negative direction.
Baseline: The thin colored line just above the bars shows the difference between the baseline and the prediction. The specifics of the baseline calculation vary with the explanation method, but is usually approximately the mean prediction of the training/reference data distribution (i.e. the dataset specified when importing the model into Fiddler).
Two numbers accompany each feature’s bar in the UI.
The first number is the attribution. The sum of these values over all features will always equal the difference between the model prediction and a baseline prediction value, which represents a typical model prediction.
The second number, the percentage in parentheses, is the feature attribution divided by the sum of the absolute values of all the feature attributions. This provides an easy to compare, relative measure of feature strength and directionality (notice that negative attributions have negative percentages) and is bounded by ±100%.
Note that an input box labeled “Top N” controls how many attributions are visible at once. If the values don’t add up as described above, it’s likely that weaker attributions are being filtered-out by this control.
Finally, it’s important to note that feature attributions combine model behavior with characteristics of the data distribution.
Language (NLP) Models¶
For language models, Fiddler’s Point Explanation provides the word-level impact on the prediction score when using perturbative methods (SHAP and Fiddler); for the Integrated Gradients method, tokenization can be customized in your model’s package.py wrapper script. The explanations are interactive: edit the text, and the explanation updates immediately.
Here is an example of an explanation of a prediction from a sentiment analysis model:
Point Explanation Methods: How to Quantify Prediction Impact of a Feature?¶
One strategy for explaining the prediction of a machine learning model is to measure the influence that each of its inputs have on the prediction made. This is called Feature Impact.
To measure Feature Impact, additive attribution methods can be quite powerful. Fiddler includes:
- SHAP and Fiddler, which require only the ability to ask a model for predictions, and are thus suitable across all types of models; no knowledge of the model implementation is necessary.
- Integrated Gradients, a method that takes advantage of the gradient vector of the prediction, which is typically available in deep learning models, to efficiently explain complex models with large input dimensionality.
To explain a prediction with an additive attribution method, we look at how individual features contribute to the prediction difference. The prediction difference is a comparison between the prediction as a point in feature space (we refer to this as the explain-point), and a counterfactual baseline position (or a distribution of positions), representing an uninteresting or typical model inference.
Each feature is assigned a fraction of the prediction difference for which it is responsible. This fraction is called the feature attribution, and it’s what we show in our explanations.
Additive attribution methods have the following characteristics:
- The sum of feature attributions always equals the prediction difference.
- Features that have no effect on a model’s prediction receive a feature attribution of zero.
- Features that have the identical effect receive the same attribution.
- Features with mutual information share the attribution for any effect that information has on the prediction.
Additionally, each of these methods takes into account interactions between the features — e.g. two features that have no effect individually but in combination change the model output. This is explicitly built into the Shapley Value formalism, and is captured in the path integral over gradients in Integrated Gradients.
Shapley Values and their Approximation
The Shapley value [proposed by Lloyd Shapley in 1953] is one way to derive feature attributions. Shapley values distribute the total payoff of a collaborative game to a coalition of cooperating players. They are computed by tabulating the average gain in payoff when a particular player is added to the coalition, over all coalition sizes and permutations of players.
In our case, we consider the “total gains” to be the prediction value, and a “player” is a single model feature. The collaborative “game” is all of the model features cooperating to form a prediction value.
What do we create “coalitions” with only a subset of the features? In some scenarios, it may be appropriate to replace a feature with a zero value when removed from the coalition (e.g. text models where no mask token is available). In others, like models with dense tabular inputs, values are swapped in from a reference distribution or baseline example as a zero value may have a specific meaning (like zero income on a credit application).
Shapley values have desirable properties including:
- linearity: If two games are combined, then the total gains correspond to the gains derived from a linear combination of the gains of each game.
- efficiency: The sum of the values of all players equals the value of the grand coalition, so that all the gain is distributed among the players. In our case, the efficiency property says: the feature attributions should sum to the prediction value. The attributions can be negative or positive, since a feature can lower or raise a predicted value.
Approximating Shapley Values
Computation of exact Shapley values can be extremely computationally expensive — in fact, exponentially so, in the number of input features. Fiddler makes two approximation methods available:
- SHAP [SHapely Additive exPlanations] approximates Shapley values by sampling coalitions according to a combinatorially weighted kernel (compensating for the number of permutations of features in coalitions of different cardinality). It samples the feature space uniformly between baseline-like feature vectors and explain-point-like feature vectors, This has the effect of downsampling behavior in the immediate vicinity of the explain-point, a region where the model may be saturated, uniform in is prediction, and attributions may not be helpful.
- Fiddler  builds on the SHAP approach and is optimized for computing distributions of Shapley values for each feature by comparing the explain-point against a distribution of baselines. This makes it possible to compute confidence intervals around the mean attribution for each feature and identify clusters in attribution space where distinct, individually relevant explanations might be important (e.g. “your loan application was rejected for these reasons compared to applications in your region and for these reasons compared to applications with the same profession”).
Approximate Shapley value methods can be used to explain nearly any model, since you only need to be able to ask the model for predictions at a variety of positions in the feature space.
Another additive attribution method: the Integrated Gradients method.
For models whose prediction is continuous and piecewise differentiable in the feature space, it can be useful to provide additional information through the gradient (slope vector) of a prediction.
Fiddler supports Integrated Gradients (IG) . In this method, an approximate integral tabulates components of the slope along a linear path from baseline to explain-point, and attributes them to respective input features. This has several advantages:
- For models with very high dimensional feature volumes (e.g. images, text), where differentiable deep-learning models typically excel, this method can be very performant (O(N) vs. the O(2^n) of the Shapley methods)
- Attributions can be computed for and to intermediate layers within the model providing fine-grained model diagnostics. This is naturally extendable to models with hybrid and multimodal inputs.
- In comparison to local gradients and saliency methods, the IG path integral samples the large-scale behavior of the model and is resistant to amplifying noise in the possibly saturated region around the explain-point.
- S. Lundberg, S Lee. “A Unified Approach to Interpreting Model Predictions.” NeurIPS, 2017 http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
- L. Merrick and A. Taly “The Explanation Game: Explaining Machine Learning Models Using Shapley Values” https://arxiv.org/abs/1909.08128
- M. Sundararajan, A. Taly, Q. Yan “Axiomatic Attribution for Deep Networks” http://proceedings.mlr.press/v70/sundararajan17a/sundararajan17a.pdf