The AI Explainability 360 (AIX360) toolkit is a Python library that offers a wide range of explanation types as well as some explainability metrics. AIX360 offers excellent guidance material, an interactive demo as well as developer tutorials. What’s particularly good about this material is that it stimulates reflection on which type of explanation is appropriate, not only from a technical point of view, but also with respect to the target explainer and explainee.
The IBM AI Fairness 360 Toolkit contains several bias mitigation algorithms that are applicable to various stages of the machine learning pipeline. The toolkit implements different notions of fairness, both on individual and the group level, and several fairness metrics for both classes of fairness. The toolkit provides additional guidance on choosing metrics and mitigation algorithms given a particular goal and application.
The following should be noted when using the fairness toolkit (and other similar toolkits, for that matter):
Alibi is an open-source Python library that supports various interpretability techniques and a broad array of explanation types. The README already provides an overview of the supported methods and when they are applicable. The following table with supported methods is copied from the README (slightly abbreviated):
Supported methods Method Models Explanations Classification Regression Tabular Text Images Categorical features ALE BB global ✔ ✔ ✔ Anchors BB local ✔ ✔ ✔ ✔ ✔ CEM BB* TF/Keras local ✔ ✔ ✔ Counterfactuals BB* TF/Keras local ✔ ✔ ✔ Prototype Counterfactuals BB* TF/Keras local ✔ ✔ ✔ ✔ Integrated Gradients TF/Keras local ✔ ✔ ✔ ✔ ✔ ✔ Kernel SHAP BB local global ✔ ✔ ✔ ✔ Tree SHAP WB local global ✔ ✔ ✔ ✔ The README also explains the keys:
Captum is a model interpretability library specifically PyTorch. It is actively maintained at the moment of writing and supports an extensive array of interpretability methods.
The Captum website also offers a large range of hands-on tutorials for various use cases.
Supported interpretability methods Captum supports a very extensive list of interpretability algorithms. All paper references for each of the supported methods are listed in the README, so they will not be repeated here.
DiCE implements counterfactual (CF) explanations that provide this information by showing feature-perturbed versions of the same person who would have received the loan, e.g., you would have received the loan if your income was higher by $10,000. In other words, it provides “what-if” explanations for model output and can be a useful complement to other explanation methods, both for end-users and model developers.
A main innovation of DiCE is that it implements a method to make producing counter-factual examples more model-agnostic:
The documentation of fairlearn is excellent and provides a good introduction to the topic of fairness in AI. It is emphasized that fairness algorithms are no plug-and-play technical solutions, but require serious thought about the context of the data and the problem at hand.
Fairness is a fundamentally sociotechnical challenge and cannot be solved with technical tools alone. They may be helpful for certain tasks such as assessing unfairness through various metrics, or to mitigate observed unfairness when training a model.
The InterpretML toolkit, developed at Microsoft, can be decomposed in two major components:
A set of interpretable “glassbox” models Techniques for explaining black box systems. W.r.t. 1, InterpretML particularly contains a new interpretable “glassbox” model that combines Generalized Additive Models (GAMs) with machine learning techniques such as gradient boosted trees, called an Explainable Boosting Machine.
Other than this new interpretable model, the main utility of InterpretML is to unify existing explainability techniques under a single API.
The type of explanation LIME offers is a surrogate model that approximates a black box prediction locally. The surrogate model is a sparse linear model, which means that the surrogate model is interpretable (in this case, it’s weights are meaningful). This simpler model can thus help to explain the black box prediction, assuming the local approximation is actually sufficiently representative.
The intuition behind this is provided in the README:
Intuitively, an explanation is a local linear approximation of the model’s behaviour.
The OpenMined community is a collaboration of several organizations, including TensorFlow, PyTorch and Keras, to create an open-source ecosystem of privacy tools that extend libraries such as PyTorch with cryptographic techniques and differential privacy. The aim is to contribute to the adaptation of privacy-preserving AI.
To this end, OpenMined offers several privacy-preserving tools on their github. A main tool is PySyft, which allows “computing on data you do not own and cannot see”.
The SHAP package is built on the concept of a Shapley value and can generate explanations model-agnostically. So it only requires input and output values, not model internals:
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. (README)
Additionally, this package also contains several model-specific implementations of Shapley values that are optimized for a particular machine learning model and sometimes even for a particular library.
TensorFlow Privacy is a library that allows you to replace default TensorFlow optimizers with optimizers that allow training with differential privacy, i.e. they implement forms of stochastic gradient descent (SGD) with differential privacy.
Because large neural networks or other differentiable models have a very large learning capacity, it can happen that the model achieves high performance on uncommon training input by simply “memorizing” the training input. If the training data is sensitive, for example information about a specific user, this is undesired behavior that may leak private information.
This library provides a separate predict() function for scikit-learn tree-based models (so also ensembles) that outputs a prediction with interpretable elements of the shape prediction = bias + feature_1_contribution + ... + feature_n_contribution.
That is, it turns these tree-based models into a white box , where we can inspect how much each feature contributes to the predicted value (in the case of regression) or how much it contributes to the estimated probability of a class (given classification).
The What-If Tool (WIT) takes a pretrained model and then allows you to visualize the effect of changing e.g. classification thresholds or the data points themselves on performance, explainability and fairness metrics.
Many convenient functions for gaining insight in the data set are provided, such as binning on particular features, attribution values, or inference scores, computing partial dependence plots, and typical performance indicators such as a confusion matrix or ROC curve.