White box explanations

Typically a distinction is made between “black box” models and “white box” models. White box models are models where it is meaningful to look at the model’s components. The most simple example is a linear regression problem, where the regression coefficients can (i.e. given proper preprocessing etc.) give insights into how much each feature contributes to the final prediction.

Some explainability packages do not try to explain “black boxes”, but instead propose using white box models either as a replacement for the black box model or a surrogate to it. For the latter case, see local surrogate and global surrogate . A white box explanation can also be called a “direct” explanation, see for example AI Explainability 360 .

AI Explainability 360

The AI Explainability 360 (AIX360) toolkit is a Python library that offers a wide range of explanation types as well as some explainability metrics. AIX360 offers excellent guidance material, an interactive demo as well as developer tutorials. What’s particularly good about this material is that it stimulates reflection on which type of explanation is appropriate, not only from a technical point of view, but also with respect to the target explainer and explainee. Read more...

ELI5

ELI5 (“Explain Like I’m 5”) provides model-specific support for models from scikit-learn, lightning, decision tree ensembles using the xgboost, LightGBM, CatBoost libraries. ELI5 mainly provides convenient wrappers to couple the feature importance coefficients that these libraries already provide with feature names, as well as convenient ways to visualize importances, e.g. by highlighting words in a text. For Keras image classifiers an implementation of the gradient-based Grad-CAM visualizations is offered, but the TensorFlow V2 backend is not supported. Read more...

InterpretML

The InterpretML toolkit, developed at Microsoft, can be decomposed in two major components: A set of interpretable “glassbox” models Techniques for explaining black box systems. W.r.t. 1, InterpretML particularly contains a new interpretable “glassbox” model that combines Generalized Additive Models (GAMs) with machine learning techniques such as gradient boosted trees, called an Explainable Boosting Machine. Other than this new interpretable model, the main utility of InterpretML is to unify existing explainability techniques under a single API. Read more...

SHAP: SHapley Additive exPlanations

The SHAP package is built on the concept of a Shapley value and can generate explanations model-agnostically. So it only requires input and output values, not model internals: SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. (README) Additionally, this package also contains several model-specific implementations of Shapley values that are optimized for a particular machine learning model and sometimes even for a particular library. Read more...

TreeInterpreter

This library provides a separate predict() function for scikit-learn tree-based models (so also ensembles) that outputs a prediction with interpretable elements of the shape prediction = bias + feature_1_contribution + ... + feature_n_contribution. That is, it turns these tree-based models into a white box , where we can inspect how much each feature contributes to the predicted value (in the case of regression) or how much it contributes to the estimated probability of a class (given classification). Read more...