Gradient-based explanations

Gradient-based explanation methods use gradients (e.g. in deep neural networks) to evaluate the contribution of a model component on the model output. Some methods use gradients for evaluating the contribution of input features on the output, but others instead show the influence of a single (hidden) neuron on the output, or instead the influence of an input feature on a particular neuron’s activation value.

In the case of computer vision (where deep convolutional neural networks are heavily used), one can show gradient-based information as a salience map.

Alibi

Alibi is an open-source Python library that supports various interpretability techniques and a broad array of explanation types. The README already provides an overview of the supported methods and when they are applicable. The following table with supported methods is copied from the README (slightly abbreviated): Supported methods Method Models Explanations Classification Regression Tabular Text Images Categorical features ALE BB global ✔ ✔ ✔ Anchors BB local ✔ ✔ ✔ ✔ ✔ CEM BB* TF/Keras local ✔ ✔ ✔ Counterfactuals BB* TF/Keras local ✔ ✔ ✔ Prototype Counterfactuals BB* TF/Keras local ✔ ✔ ✔ ✔ Integrated Gradients TF/Keras local ✔ ✔ ✔ ✔ ✔ ✔ Kernel SHAP BB local global ✔ ✔ ✔ ✔ Tree SHAP WB local global ✔ ✔ ✔ ✔ The README also explains the keys: Read more...

Captum

Captum is a model interpretability library specifically PyTorch. It is actively maintained at the moment of writing and supports an extensive array of interpretability methods. The Captum website also offers a large range of hands-on tutorials for various use cases. Supported interpretability methods Captum supports a very extensive list of interpretability algorithms. All paper references for each of the supported methods are listed in the README, so they will not be repeated here. Read more...

DeepExplain

The DeepExplain Python package for TensorFlow models and Keras models with TensorFlow backend offers two types of interpretability methods for deep convolutional neural networks: gradient-based methods and perturbation-based methods. This package does not seem to be very actively maintained anymore and support for TensorFlow V2 is limited. Attributions The README gives the following clear and succinct explanation of what an “attribution” is. All methods included in this approach allow visualization of how each input feature contributes to the final prediction, in terms of what a particular targeted neuron “sees”: Read more...

DeepLIFT

A brief explanation of the gradient-based interpretability method called DeepLIFT is given by Shrikumar et al. in the abstract of the linked paper: DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its ‘reference activation’ and assigns contribution scores according to the difference. Read more...

ELI5

ELI5 (“Explain Like I’m 5”) provides model-specific support for models from scikit-learn, lightning, decision tree ensembles using the xgboost, LightGBM, CatBoost libraries. ELI5 mainly provides convenient wrappers to couple the feature importance coefficients that these libraries already provide with feature names, as well as convenient ways to visualize importances, e.g. by highlighting words in a text. For Keras image classifiers an implementation of the gradient-based Grad-CAM visualizations is offered, but the TensorFlow V2 backend is not supported. Read more...

Interpret-Text

Interpret-Text is an extension of InterpretML , specifically for several text models. Three modules are provided: ClassicalTextExplainer, UnifiedInformationExplainer and IntrospectiveRationaleExplainer. Classical Text Explainer The ClassicalTextExplainer supports linear models from sklearn with a coefs_ call and tree-based models for which feature_importances_ is defined. ClassicalTextExplainer includes a NLP pipeline from preprocessing to hyperparameter tuning, so it accepts raw text data as input. The default pipeline uses a unigram bag-of-words model. Elements of the pipeline can be replaced if desired. Read more...

SHAP: SHapley Additive exPlanations

The SHAP package is built on the concept of a Shapley value and can generate explanations model-agnostically. So it only requires input and output values, not model internals: SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. (README) Additionally, this package also contains several model-specific implementations of Shapley values that are optimized for a particular machine learning model and sometimes even for a particular library. Read more...

What-If Tool

The What-If Tool (WIT) takes a pretrained model and then allows you to visualize the effect of changing e.g. classification thresholds or the data points themselves on performance, explainability and fairness metrics. Many convenient functions for gaining insight in the data set are provided, such as binning on particular features, attribution values, or inference scores, computing partial dependence plots, and typical performance indicators such as a confusion matrix or ROC curve. Read more...