Counterfactual explanations

A clear description counterfactual explanations, which is very important for human causal reasoning, is provided by Molnar & Dandl:

A counterfactual explanation describes a causal situation in the form: “If X had not occurred, Y would not have occurred”. For example: “If I hadn’t taken a sip of this hot coffee, I wouldn’t have burned my tongue”. Event Y is that I burned my tongue; cause X is that I had a hot coffee. Thinking in counterfactuals requires imagining a hypothetical reality that contradicts the observed facts (for example, a world in which I have not drunk the hot coffee), hence the name “counterfactual”. The ability to think in counterfactuals makes us humans so smart compared to other animals. (Molnar & Dandl)

They also give a general approach for supporting counterfactual explanations about machine learning models:

We simply change the feature values of an instance before making the predictions and we analyze how the prediction changes. We are interested in scenarios in which the prediction changes in a relevant way, like a flip in predicted class (for example, credit application accepted or rejected) or in which the prediction reaches a certain threshold (for example, the probability for cancer reaches 10%). A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output. (Molnar & Dandl)

Alibi

Alibi is an open-source Python library that supports various interpretability techniques and a broad array of explanation types. The README already provides an overview of the supported methods and when they are applicable. The following table with supported methods is copied from the README (slightly abbreviated): Supported methods Method Models Explanations Classification Regression Tabular Text Images Categorical features ALE BB global ✔ ✔ ✔ Anchors BB local ✔ ✔ ✔ ✔ ✔ CEM BB* TF/Keras local ✔ ✔ ✔ Counterfactuals BB* TF/Keras local ✔ ✔ ✔ Prototype Counterfactuals BB* TF/Keras local ✔ ✔ ✔ ✔ Integrated Gradients TF/Keras local ✔ ✔ ✔ ✔ ✔ ✔ Kernel SHAP BB local global ✔ ✔ ✔ ✔ Tree SHAP WB local global ✔ ✔ ✔ ✔ The README also explains the keys: Read more...

DiCE: Diverse Counterfactual Explanations

From README: DiCE implements counterfactual (CF) explanations that provide this information by showing feature-perturbed versions of the same person who would have received the loan, e.g., you would have received the loan if your income was higher by $10,000. In other words, it provides “what-if” explanations for model output and can be a useful complement to other explanation methods, both for end-users and model developers. A main innovation of DiCE is that it implements a method to make producing counter-factual examples more model-agnostic: Read more...

What-If Tool

The What-If Tool (WIT) takes a pretrained model and then allows you to visualize the effect of changing e.g. classification thresholds or the data points themselves on performance, explainability and fairness metrics. Many convenient functions for gaining insight in the data set are provided, such as binning on particular features, attribution values, or inference scores, computing partial dependence plots, and typical performance indicators such as a confusion matrix or ROC curve. Read more...