Most AI applications nowadays rely on large amounts of data, and regulations like the European Data Protection Regulation (GDPR) have to be taken into account.
Tools for privacy in AI can for example be practical guidelines for implementing privacy by design into AI applications, or technical methods to prevent extraction of privacy-sensitive information from trained models.
Butnaru and others associated with the HAI center at Stanford set up a Agile Ethics workflow in the form of a Trello board. From left to right, the workflow walks you through relevant ethical considerations at the various steps of a machine learning pipeline. The phases are:
Scope Consider ethical implications of the project Consider skill mapping (what’s the impact of AI on jobs)? Facilitates up-skilling or a change of strategy in the use of human talent Data audit Led by Chief Data Officer “Meet and plan” stage in Agile Helpful: Data Ethics Canvas Train Build stage in Agile Consider (tools for) transparency and fairness Analyse Benchmarks, including benchmarks related to e.
The Data Ethics Canvas is a tool developed by the Open Data Institute for providing ethical guidance to organizations doing any type of project involving data. That includes data collection, sharing, and its usage for example in machine learning applications. The tool is accompanied with a white paper and a brief practical guide for its usage.
Page 3 of the practical guide lists some recommendations that are also relevant when you do not use this tool.
The OpenMined community is a collaboration of several organizations, including TensorFlow, PyTorch and Keras, to create an open-source ecosystem of privacy tools that extend libraries such as PyTorch with cryptographic techniques and differential privacy. The aim is to contribute to the adaptation of privacy-preserving AI.
To this end, OpenMined offers several privacy-preserving tools on their github. A main tool is PySyft, which allows “computing on data you do not own and cannot see”.
TensorFlow Privacy is a library that allows you to replace default TensorFlow optimizers with optimizers that allow training with differential privacy, i.e. they implement forms of stochastic gradient descent (SGD) with differential privacy.
Because large neural networks or other differentiable models have a very large learning capacity, it can happen that the model achieves high performance on uncommon training input by simply “memorizing” the training input. If the training data is sensitive, for example information about a specific user, this is undesired behavior that may leak private information.