DiCE: Diverse Counterfactual Explanations


From README:

DiCE implements counterfactual (CF) explanations that provide this information by showing feature-perturbed versions of the same person who would have received the loan, e.g., you would have received the loan if your income was higher by $10,000. In other words, it provides “what-if” explanations for model output and can be a useful complement to other explanation methods, both for end-users and model developers.

A main innovation of DiCE is that it implements a method to make producing counter-factual examples more model-agnostic:

Barring simple linear models, however, it is difficult to generate CF examples that work for any machine learning model. DiCE is based on recent research that generates CF explanations for any ML model. The core idea is to setup finding such explanations as an optimization problem, similar to finding adversarial examples.

DiCe supports various model-agnostic methods to find counterfactual examples.

  • Randomized sampling
  • KD-tree algorithm
  • Genetic algorithm

Additionally, gradient-based methods are provided for differentiable models (e.g. a neural network).

  • Loss-based method from Mothilal et al. (2020)
  • Method based on a variational auto-encoder, Mahajan et al. (2019)

An interesting detail is that DiCE does not necessarily need access to the full data set, so it is possible to generate counterfactuals while keeping the data private.