Explanations

Accumulated Local Effects (ALE)

Computes feature effects (first-order) on a model for a given dataset (tabular). ALE expresses for a given feature how, on average, it influences the prediction of a model.

Anchor

An “anchor” is a subset of features and their value ranges for which the model will almost always output the same prediction. Another way of saying this is that if the feature values of an anchor are satisfied, other features will very likely not affect the prediction, i.e. the prediction is “anchored”. Anchors are by design interpretable, because they clearly indicate for which feature values they apply. Such an anchor can be expressed as an if-then rule, but only in a specific feature range. Read more...

Contrastive explanations

Human explanations are often contrastive, meaning that they do not answer the indeterminate “Why?” question, but instead “Why P, rather than Q?”. For example, when a mortgage application is denied, we are not interested in a very long list of tiny little details that all contributed to that decision, but we want a to-the-point explanation that shows us what we minimally have to change to get the mortgage. For example, the CEM method supports such an explanation by finding the minimal set of features that lead to prediction P (so this looks like an anchor explanation), and additionally a minimal set of features that should be absent to maintain decision P instead of the closest class Q (which is somewhat similar to a counterfactual ). Read more...

Counterfactual explanations

A clear description counterfactual explanations, which is very important for human causal reasoning, is provided by Molnar & Dandl: A counterfactual explanation describes a causal situation in the form: “If X had not occurred, Y would not have occurred”. For example: “If I hadn’t taken a sip of this hot coffee, I wouldn’t have burned my tongue”. Event Y is that I burned my tongue; cause X is that I had a hot coffee. Read more...

Example-based explanations

Example-based explanations can be used either to provide more insight into a data set or insight into the predictions of machine learning models. The explanations themselves are (a selection of) data instances, which assumes that data instances can be presented in a meaningful way. For example, a data instance with 1000 features is not going to provide meaningful insights for humans. Insight into data can be gained by providing prototypical examples, i. Read more...

Global surrogate models

One approach to explain predictions of “black box” models is to approximate its predictions with a white box model that is interpretable. A global surrogate model is a model that tries to do this approximation for a larger set of instances, ideally the whole input feature range. Also see local surrogate models.

Gradient-based explanations

Gradient-based explanation methods use gradients (e.g. in deep neural networks) to evaluate the contribution of a model component on the model output. Some methods use gradients for evaluating the contribution of input features on the output, but others instead show the influence of a single (hidden) neuron on the output, or instead the influence of an input feature on a particular neuron’s activation value. In the case of computer vision (where deep convolutional neural networks are heavily used), one can show gradient-based information as a salience map. Read more...

Individual Conditional Expectation (ICE)

An ICE plot can be seen as a partial dependence plot (PDP) that does not use the overal expectation of other features than the feature of interest, but instead the feature values of a single data point only. So the reasoning behind an ICE plot is similar to a PDP, but the main difference is that whereas a PDP uses a “global” expectation over the other data points, the ICE plot contains a single line for each data instance. Read more...

Local surrogate models

One approach to explain predictions of “black box” models is to approximate its predictions with a white box model that is interpretable. A local surrogate model is a model that does this approximation more accurately only in the “local” feature space surrounding a single input. The idea is that even though a white box model may not accurately capture the behavior of a black box model globally (the feature space of a large set of training instances), it may e. Read more...

Partial dependence plot (PDP)

A partial dependence plot (PDP) visualizes the marginal effect of a feature on the outcome of a machine learning model. This effect is computed by “marginalizing out” the contribution of all other features, so that the result is a function that only depends on the feature of interest. Contributions by other features are factored into the marginal effect. The PDP is then produced by plotting the predicted outcome (y-axis) against the values of the feature of interest (x-axis). Read more...

Perturbation

Explainability methods based on permutation explain the importance of a feature for a particular prediction by perturbating the feature and then investigating how this change affects the outcome, in particular whether the prediction error increases. If perturbing a feature strongly increases prediction error, then this is an indication of this feature’s importance. This approach includes permutation feature importance.

Sensitivity Analysis

“Sensitivity analysis” is a family of techniques to determine how sensitive a model’s prediction is to particular features. For various approaches, see here.

Shapley value explanations

Methods using Shapley values are very influential and use a particular form of perturbation to explain the importance of features for predictions. The importance of a feature for a prediction is measured in terms of how much this feature “pushes” the prediction away from a relevant baseline value, such as an average over a reference set. Different methods use different forms of perturbation. One could for example look at the difference in model output between 1) when a feature is present and 2) when a feature is occluded, repeat this for all subsets of features, and then compute the average difference per feature. Read more...

White box explanations

Typically a distinction is made between “black box” models and “white box” models. White box models are models where it is meaningful to look at the model’s components. The most simple example is a linear regression problem, where the regression coefficients can (i.e. given proper preprocessing etc.) give insights into how much each feature contributes to the final prediction. Some explainability packages do not try to explain “black boxes”, but instead propose using white box models either as a replacement for the black box model or a surrogate to it. Read more...