Keras

AdvBox

Advbox offers a number of AI model security toolkits. AdversialBox allows zero-coding generation of adversial examples for a wide range of neural network frameworks. An overview of the supported attacks and defenses can be found here and the corresponding code here. It requires some effort to find all attacks mentioned on the homepage in the code base. Generally speaking, the documentation of AdvBox is incomplete and not very user-friendly. ODD: Object Detector Deception showcases a specific attack for object detection networks such as YOLO, but is not mentioned in the README. Read more...

Alibi

Alibi is an open-source Python library that supports various interpretability techniques and a broad array of explanation types. The README already provides an overview of the supported methods and when they are applicable. The following table with supported methods is copied from the README (slightly abbreviated): Supported methods Method Models Explanations Classification Regression Tabular Text Images Categorical features ALE BB global ✔ ✔ ✔ Anchors BB local ✔ ✔ ✔ ✔ ✔ CEM BB* TF/Keras local ✔ ✔ ✔ Counterfactuals BB* TF/Keras local ✔ ✔ ✔ Prototype Counterfactuals BB* TF/Keras local ✔ ✔ ✔ ✔ Integrated Gradients TF/Keras local ✔ ✔ ✔ ✔ ✔ ✔ Kernel SHAP BB local global ✔ ✔ ✔ ✔ Tree SHAP WB local global ✔ ✔ ✔ ✔ The README also explains the keys: Read more...

DeepExplain

The DeepExplain Python package for TensorFlow models and Keras models with TensorFlow backend offers two types of interpretability methods for deep convolutional neural networks: gradient-based methods and perturbation-based methods. This package does not seem to be very actively maintained anymore and support for TensorFlow V2 is limited. Attributions The README gives the following clear and succinct explanation of what an “attribution” is. All methods included in this approach allow visualization of how each input feature contributes to the final prediction, in terms of what a particular targeted neuron “sees”: Read more...

DeepLIFT

A brief explanation of the gradient-based interpretability method called DeepLIFT is given by Shrikumar et al. in the abstract of the linked paper: DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its ‘reference activation’ and assigns contribution scores according to the difference. Read more...

OpenMined (PySyft)

The OpenMined community is a collaboration of several organizations, including TensorFlow, PyTorch and Keras, to create an open-source ecosystem of privacy tools that extend libraries such as PyTorch with cryptographic techniques and differential privacy. The aim is to contribute to the adaptation of privacy-preserving AI. To this end, OpenMined offers several privacy-preserving tools on their github. A main tool is PySyft, which allows “computing on data you do not own and cannot see”. Read more...

SHAP: SHapley Additive exPlanations

The SHAP package is built on the concept of a Shapley value and can generate explanations model-agnostically. So it only requires input and output values, not model internals: SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. (README) Additionally, this package also contains several model-specific implementations of Shapley values that are optimized for a particular machine learning model and sometimes even for a particular library. Read more...

TensorFlow Privacy

TensorFlow Privacy is a library that allows you to replace default TensorFlow optimizers with optimizers that allow training with differential privacy, i.e. they implement forms of stochastic gradient descent (SGD) with differential privacy. Because large neural networks or other differentiable models have a very large learning capacity, it can happen that the model achieves high performance on uncommon training input by simply “memorizing” the training input. If the training data is sensitive, for example information about a specific user, this is undesired behavior that may leak private information. Read more...