differential-privacy-library

Diffprivlib: The IBM Differential Privacy Library

https://github.com/ibm/differential-privacy-library

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary

Keywords

data-privacy differential-privacy machine-learning python

Keywords from Contributors

hack bruteforce
Last synced: 6 months ago · JSON representation ·

Repository

Diffprivlib: The IBM Differential Privacy Library

Basic Info
Statistics
  • Stars: 885
  • Watchers: 31
  • Forks: 205
  • Open Issues: 7
  • Releases: 14
Topics
data-privacy differential-privacy machine-learning python
Created over 6 years ago · Last pushed 10 months ago
Metadata Files
Readme License Code of conduct Citation

README.md

Diffprivlib v0.6

Python versions Downloads PyPi version PyPi status General tests Documentation Status CodeQL codecov

Diffprivlib is a general-purpose library for differential privacy (DP). Use diffprivlib if you are looking to:

  • Experiment with differential privacy
  • Explore the impact of differential privacy on machine learning and data analytics applications
  • Prototype your own differential privacy algorithms

Since its initial release in 2019, diffprivlib has proven to be an invaluable resource for the DP community, with hundreds of citations, stars, forks and deployments. The library has lowered the barrier to entry for new scientists and engineers working in and learning about DP, spawned new research, and served as a benchmark for new algorithms and libraries.

Note: The public release of diffprivlib is intended for research and education purposes only. Please reach out to us if you are interested in using diffprivlib in a production environment.

Getting started: Machine learning with differential privacy in 30 seconds

We're using the Iris dataset, so let's load it and perform an 80/20 train/test split.

```python from sklearn import datasets from sklearn.modelselection import traintest_split

dataset = datasets.loadiris() Xtrain, Xtest, ytrain, ytest = traintestsplit(dataset.data, dataset.target, testsize=0.2) ```

Now, let's train a differentially private naive Bayes classifier. Our classifier runs just like an sklearn classifier, so you can get up and running quickly.

diffprivlib.models.GaussianNB can be run without any parameters, although this will throw a warning (we need to specify the bounds parameter to avoid this). The privacy level is controlled by the parameter epsilon, which is passed to the classifier at initialisation (e.g. GaussianNB(epsilon=0.1)). The default is epsilon = 1.0.

```python from diffprivlib.models import GaussianNB

clf = GaussianNB() clf.fit(Xtrain, ytrain) ```

We can now classify unseen examples, knowing that the trained model is differentially private and preserves the privacy of the 'individuals' in the training set (flowers are entitled to their privacy too!).

python clf.predict(X_test)

Every time the model is trained with .fit(), a different model is produced due to the randomness of differential privacy. The accuracy will therefore change, even if it's re-trained with the same training data. Try it for yourself to find out!

python print("Test accuracy: %f" % clf.score(X_test, y_test))

We can easily evaluate the accuracy of the model for various epsilon values and plot it with matplotlib.

```python import numpy as np import matplotlib.pyplot as plt

epsilons = np.logspace(-2, 2, 50) bounds = ([4.3, 2.0, 1.1, 0.1], [7.9, 4.4, 6.9, 2.5]) accuracy = list()

for epsilon in epsilons: clf = GaussianNB(bounds=bounds, epsilon=epsilon) clf.fit(Xtrain, ytrain)

accuracy.append(clf.score(X_test, y_test))

plt.semilogx(epsilons, accuracy) plt.title("Differentially private Naive Bayes accuracy") plt.xlabel("epsilon") plt.ylabel("Accuracy") plt.show() ```

Differentially private naive Bayes

Congratulations, you've completed your first differentially private machine learning task with the Differential Privacy Library! Check out more examples in the notebooks directory, or dive straight in.

Contents

Diffprivlib is comprised of four major components: 1. Mechanisms: These are the building blocks of differential privacy, and are used in all models that implement differential privacy. Mechanisms have little or no default settings, and are intended for use by experts implementing their own models. They can, however, be used outside models for separate investigations, etc. 1. Models: This module includes machine learning models with differential privacy. Diffprivlib currently has models for clustering, classification, regression, dimensionality reduction and pre-processing. 1. Tools: Diffprivlib comes with a number of generic tools for differentially private data analysis. This includes differentially private histograms, following the same format as Numpy's histogram function. 1. Accountant: The BudgetAccountant class can be used to track privacy budget and calculate total privacy loss using advanced composition techniques.

Setup

Installation with pip

The library is designed to run with Python 3. The library can be installed from the PyPi repository using pip (or pip3):

bash pip install diffprivlib

Manual installation

For the most recent version of the library, either download the source code or clone the repository in your directory of choice:

bash git clone https://github.com/IBM/differential-privacy-library

To install diffprivlib, do the following in the project folder (alternatively, you can run python3 -m pip install .): bash pip install .

The library comes with a basic set of unit tests for pytest. To check your install, you can run all the unit tests by calling pytest in the install folder:

bash pytest

Citing diffprivlib

If you use diffprivlib for research, please consider citing the following reference paper: @article{diffprivlib, title={Diffprivlib: the {IBM} differential privacy library}, author={Holohan, Naoise and Braghin, Stefano and Mac Aonghusa, P{\'o}l and Levacher, Killian}, year={2019}, journal = {ArXiv e-prints}, archivePrefix = "arXiv", volume = {1907.02444 [cs.CR]}, primaryClass = "cs.CR", month = jul }

References

Acknowledgement

Work in this repository was partially supported by the European Union's Horizon research and innovation programme under grant numbers 951911 (AI4Media) and 101070473 (FLUIDOS).

Owner

  • Name: International Business Machines
  • Login: IBM
  • Kind: organization
  • Email: awesome@ibm.com
  • Location: United States of America

Citation (CITATION.bib)

@article{diffprivlib,
  title={Diffprivlib: the {IBM} differential privacy library},
  author={Holohan, Naoise and Braghin, Stefano and Mac Aonghusa, P{\'o}l and Levacher, Killian},
  year={2019},
  journal = {ArXiv e-prints},
  archivePrefix = "arXiv",
  volume = {1907.02444 [cs.CR]},
  primaryClass = "cs.CR",
  month = jul
}

GitHub Events

Total
  • Issues event: 8
  • Watch event: 58
  • Issue comment event: 7
  • Push event: 7
  • Pull request review event: 4
  • Pull request review comment event: 3
  • Pull request event: 3
  • Fork event: 10
  • Create event: 3
Last Year
  • Issues event: 8
  • Watch event: 58
  • Issue comment event: 7
  • Push event: 7
  • Pull request review event: 4
  • Pull request review comment event: 3
  • Pull request event: 3
  • Fork event: 10
  • Create event: 3

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 549
  • Total Committers: 18
  • Avg Commits per committer: 30.5
  • Development Distribution Score (DDS): 0.443
Past Year
  • Commits: 14
  • Committers: 3
  • Avg Commits per committer: 4.667
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
Naoise Holohan n****e@n****e 306
Naoise Holohan n****e@i****m 205
Naoise Holohan 5****h@u****m 8
Mahammad Ismayilzada m****a@g****m 7
Stefano Braghin s****b@i****m 5
Stefano s****n@g****m 3
Naoise Holohan n****n@i****m 2
STEFANO BRAGHIN S****B@i****m 2
Swastik Banerjee s****7@y****n 2
Dan Ristea d****a@p****m 1
Daniel Gorelik d****k@g****m 1
Mete Ismayil m****d@i****i 1
Franziskus Kiefer f****r@g****m 1
ImgBotApp I****p@g****m 1
Steve Martinelli 4****r@u****m 1
MichaelYangg 5****g@u****m 1
NAOISE HOLOHAN N****n@i****m 1
Stefano Braghin 5****1@u****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 47
  • Total pull requests: 57
  • Average time to close issues: 5 months
  • Average time to close pull requests: 21 days
  • Total issue authors: 37
  • Total pull request authors: 14
  • Average comments per issue: 1.64
  • Average comments per pull request: 1.18
  • Merged pull requests: 50
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 6
  • Pull requests: 4
  • Average time to close issues: 11 days
  • Average time to close pull requests: 8 days
  • Issue authors: 5
  • Pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.25
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 2
Top Authors
Issue Authors
  • naoise-h (5)
  • Hiramdu (3)
  • justanotherlad (3)
  • PaulineMauryL (2)
  • nadavaviv (2)
  • benjaminlackey (1)
  • ramongonze (1)
  • kayakalison (1)
  • daniel-vos (1)
  • amanjeev (1)
  • andreamquiroz (1)
  • hwj-wik (1)
  • ibad321 (1)
  • Zeyu-Shen (1)
  • TedTed (1)
Pull Request Authors
  • naoise-h (40)
  • stefano81 (3)
  • justanotherlad (2)
  • renovate[bot] (2)
  • franziskuskiefer (1)
  • sanika-bharambe (1)
  • danrr (1)
  • dgorelik (1)
  • imgbot[bot] (1)
  • ramongonze (1)
  • MichaelYangg (1)
  • mrunmayeewaykar (1)
  • UPstartDeveloper (1)
  • mismayil (1)
Top Labels
Issue Labels
bug (7) enhancement (4) question (1)
Pull Request Labels
enhancement (6) bug (2) documentation (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 10,536 last-month
  • Total docker downloads: 9
  • Total dependent packages: 9
    (may contain duplicates)
  • Total dependent repositories: 33
    (may contain duplicates)
  • Total versions: 23
  • Total maintainers: 1
pypi.org: diffprivlib

IBM Differential Privacy Library

  • Versions: 17
  • Dependent Packages: 9
  • Dependent Repositories: 33
  • Downloads: 10,536 Last month
  • Docker Downloads: 9
Rankings
Dependent packages count: 1.0%
Stargazers count: 2.3%
Dependent repos count: 2.5%
Average: 2.9%
Downloads: 3.7%
Forks count: 3.7%
Docker downloads count: 4.1%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: diffprivlib
  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Forks count: 12.4%
Stargazers count: 14.1%
Average: 27.9%
Dependent repos count: 34.0%
Dependent packages count: 51.2%
Last synced: 6 months ago

Dependencies

.github/workflows/code.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
.github/workflows/codeql.yml actions
  • actions/checkout v3 composite
  • github/codeql-action/analyze v2 composite
  • github/codeql-action/autobuild v2 composite
  • github/codeql-action/init v2 composite
.github/workflows/deploy.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/general.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/libraries.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
setup.py pypi
  • joblib *
  • numpy *
  • scikit-learn *
  • scipy *
  • setuptools *