Explainability is instrumental for maintaining other values such as
and for trust in AI systems.
There is little consensus about what “explainability” precisely is.
The related concepts of “transparency” and “interpretability” are sometimes used as synonyms, sometimes distinctly.
For example, the explainability of machine learning models can be seen as one aspect of the overall need to be transparent in the use of AI (so transparency is the superconcept).
But one may also use the word “transparency” to indicate “white box” models that are in themselves interpretable.
The predictions of opaque black box models can however nevertheless be explained, so in this sense “transparency” is a subconcept of “explainability”.
Technical tools assume and implement various different types of explanation.
It is important to be aware of the limitations of the particular interpretation of “explainability” that is used.
Generally speaking, there is a quite large gap between technical implementations of explainability and what humans generally consider to be a good explanation.
Butnaru and others associated with the HAI center at Stanford set up a Agile Ethics workflow in the form of a Trello board. From left to right, the workflow walks you through relevant ethical considerations at the various steps of a machine learning pipeline. The phases are:
Scope Consider ethical implications of the project Consider skill mapping (what’s the impact of AI on jobs)? Facilitates up-skilling or a change of strategy in the use of human talent Data audit Led by Chief Data Officer “Meet and plan” stage in Agile Helpful: Data Ethics Canvas Train Build stage in Agile Consider (tools for) transparency and fairness Analyse Benchmarks, including benchmarks related to e.
The AI Explainability 360 (AIX360) toolkit is a Python library that offers a wide range of explanation types as well as some explainability metrics. AIX360 offers excellent guidance material, an interactive demo as well as developer tutorials. What’s particularly good about this material is that it stimulates reflection on which type of explanation is appropriate, not only from a technical point of view, but also with respect to the target explainer and explainee.
Alibi is an open-source Python library that supports various interpretability techniques and a broad array of explanation types. The README already provides an overview of the supported methods and when they are applicable. The following table with supported methods is copied from the README (slightly abbreviated):
Supported methods Method Models Explanations Classification Regression Tabular Text Images Categorical features ALE BB global ✔ ✔ ✔ Anchors BB local ✔ ✔ ✔ ✔ ✔ CEM BB* TF/Keras local ✔ ✔ ✔ Counterfactuals BB* TF/Keras local ✔ ✔ ✔ Prototype Counterfactuals BB* TF/Keras local ✔ ✔ ✔ ✔ Integrated Gradients TF/Keras local ✔ ✔ ✔ ✔ ✔ ✔ Kernel SHAP BB local global ✔ ✔ ✔ ✔ Tree SHAP WB local global ✔ ✔ ✔ ✔ The README also explains the keys:
Captum is a model interpretability library specifically PyTorch. It is actively maintained at the moment of writing and supports an extensive array of interpretability methods.
The Captum website also offers a large range of hands-on tutorials for various use cases.
Supported interpretability methods Captum supports a very extensive list of interpretability algorithms. All paper references for each of the supported methods are listed in the README, so they will not be repeated here.
Dhurandhar et al. support a type of contrastive explanation based on what they call pertinent negatives. A contrastive explanation answers the question: “Why P, rather than Q”?
CEM supports such an explanation by finding the minimal set of features that lead to prediction P (a pertinent positive that resembles an anchor explanation), and additionally a minimal set of features that should be absent to maintain decision P instead of the decision for closest class Q (a pertinent negative that is somewhat similar to a counterfactual ).
The Data Ethics Canvas is a tool developed by the Open Data Institute for providing ethical guidance to organizations doing any type of project involving data. That includes data collection, sharing, and its usage for example in machine learning applications. The tool is accompanied with a white paper and a brief practical guide for its usage.
Page 3 of the practical guide lists some recommendations that are also relevant when you do not use this tool.
The DeepExplain Python package for TensorFlow models and Keras models with TensorFlow backend offers two types of interpretability methods for deep convolutional neural networks: gradient-based methods and perturbation-based methods. This package does not seem to be very actively maintained anymore and support for TensorFlow V2 is limited.
Attributions The README gives the following clear and succinct explanation of what an “attribution” is. All methods included in this approach allow visualization of how each input feature contributes to the final prediction, in terms of what a particular targeted neuron “sees”:
A brief explanation of the gradient-based interpretability method called DeepLIFT is given by Shrikumar et al. in the abstract of the linked paper:
DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its ‘reference activation’ and assigns contribution scores according to the difference.
DiCE implements counterfactual (CF) explanations that provide this information by showing feature-perturbed versions of the same person who would have received the loan, e.g., you would have received the loan if your income was higher by $10,000. In other words, it provides “what-if” explanations for model output and can be a useful complement to other explanation methods, both for end-users and model developers.
A main innovation of DiCE is that it implements a method to make producing counter-factual examples more model-agnostic:
ELI5 (“Explain Like I’m 5”) provides model-specific support for models from scikit-learn, lightning, decision tree ensembles using the xgboost, LightGBM, CatBoost libraries. ELI5 mainly provides convenient wrappers to couple the feature importance coefficients that these libraries already provide with feature names, as well as convenient ways to visualize importances, e.g. by highlighting words in a text. For Keras image classifiers an implementation of the gradient-based Grad-CAM visualizations is offered, but the TensorFlow V2 backend is not supported.
This repository by H2O.ai contains useful resources and notebooks that showcase well-known machine learning interpretability techniques. The examples use the h2o Python package with their own estimators (e.g. their own fork of XGBoost), but all code is open-source and the examples are still illustrative of the interpretability techniques. These case studies that also deal with practical coding issues and preprocessing steps, e.g. that LIME can be unstable when there are strong correlations between input variables.
Interpret-Text is an extension of InterpretML , specifically for several text models. Three modules are provided: ClassicalTextExplainer, UnifiedInformationExplainer and IntrospectiveRationaleExplainer.
Classical Text Explainer The ClassicalTextExplainer supports linear models from sklearn with a coefs_ call and tree-based models for which feature_importances_ is defined.
ClassicalTextExplainer includes a NLP pipeline from preprocessing to hyperparameter tuning, so it accepts raw text data as input. The default pipeline uses a unigram bag-of-words model. Elements of the pipeline can be replaced if desired.
The InterpretML toolkit, developed at Microsoft, can be decomposed in two major components:
A set of interpretable “glassbox” models Techniques for explaining black box systems. W.r.t. 1, InterpretML particularly contains a new interpretable “glassbox” model that combines Generalized Additive Models (GAMs) with machine learning techniques such as gradient boosted trees, called an Explainable Boosting Machine.
Other than this new interpretable model, the main utility of InterpretML is to unify existing explainability techniques under a single API.
The type of explanation LIME offers is a surrogate model that approximates a black box prediction locally. The surrogate model is a sparse linear model, which means that the surrogate model is interpretable (in this case, it’s weights are meaningful). This simpler model can thus help to explain the black box prediction, assuming the local approximation is actually sufficiently representative.
The intuition behind this is provided in the README:
Intuitively, an explanation is a local linear approximation of the model’s behaviour.
The SHAP package is built on the concept of a Shapley value and can generate explanations model-agnostically. So it only requires input and output values, not model internals:
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. (README)
Additionally, this package also contains several model-specific implementations of Shapley values that are optimized for a particular machine learning model and sometimes even for a particular library.
This library provides a separate predict() function for scikit-learn tree-based models (so also ensembles) that outputs a prediction with interpretable elements of the shape prediction = bias + feature_1_contribution + ... + feature_n_contribution.
That is, it turns these tree-based models into a white box , where we can inspect how much each feature contributes to the predicted value (in the case of regression) or how much it contributes to the estimated probability of a class (given classification).
The What-If Tool (WIT) takes a pretrained model and then allows you to visualize the effect of changing e.g. classification thresholds or the data points themselves on performance, explainability and fairness metrics.
Many convenient functions for gaining insight in the data set are provided, such as binning on particular features, attribution values, or inference scores, computing partial dependence plots, and typical performance indicators such as a confusion matrix or ROC curve.
This library is a small toolbox that offers some convenience functions for quickly visualizing imbalances in the data set, computing (permutation) feature importances and metrics such as the ROC-curve. A function to balance the data is offered through basic up- or downsampling, but other than this no fairness criteria are defined.
Compared to other libraries the XAI Toolbox is very basic and currently the roadmap (which is not updated since 2019) does not include any major improvements.