The DeepExplain Python package for TensorFlow models and Keras models with TensorFlow backend offers two types of interpretability methods for deep convolutional neural networks: gradient-based methods and perturbation-based methods. This package does not seem to be very actively maintained anymore and support for TensorFlow V2 is limited.


The README gives the following clear and succinct explanation of what an “attribution” is. All methods included in this approach allow visualization of how each input feature contributes to the final prediction, in terms of what a particular targeted neuron “sees”:

Consider a network and a specific input to this network (eg. an image, if the network is trained for image classification). The input is multi-dimensional, made of several features. In the case of images, each pixel can be considered a feature. The goal of an attribution method is to determine a real value R(x_i) for each input feature, with respect to a target neuron of interest (for example, the activation of the neuron corresponsing to the correct class).

When the attributions of all input features are arranged together to have the same shape of the input sample we talk about attribution maps (as in the picture below), where red and blue colors indicate respectively features that contribute positively to the activation of the target output and features having a suppressing effect on it.

Included approaches

The README of the package provide an excellent guide on which method to use, with the pros and cons of each approach, as well as notes on the relevant parameters to set and in which context the method is appropriate.

Gradient-based interpretability methods

  • Saliency maps (see Simonyan et al.)
  • Gradient * input (see Shrikumar et al. )
  • Integrated gradients ( see the two papers from Sundararajan et al. )
  • A DeepLIFT implementation that only implements the “Rescale rule” of the original implementation
    • The original implementation later added an additional “RevealCancel” rule.
  • Pixel-wise explanations by layer-wise relevance propagation (e-LRP, see Bach et al.)

DeepLIFT and pixel-wise explanations using LRP override gradient operators, i.e. they use a modified chain-rule. This is thus a notable difference with the original DeepLIFT implementation.

Perturbation-based interpretability methods

  • Occlusion (extension of the method explained by Zeiler et al. )
  • Shapley value sampling (see Castro et al.)
    • Cf. SHAP , where insights from this project have been integrated in the DeepExplainer.