https://github.com/scimorph/secureml

Easy-to-use utilities to build privacy-preserving AI.

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.1%) to scientific vocabulary

Keywords

ai click compliance federated-learning flower gdpr hashicorp-vault jinja2 ml numpy opacus pandas privacy python pytorch sdv tensorflow weasyprint yaml

Last synced: 5 months ago · JSON representation

Repository

Easy-to-use utilities to build privacy-preserving AI.

Basic Info

Host: GitHub
Owner: scimorph
License: mit
Language: Python
Default Branch: master
Homepage: https://secureml.readthedocs.io
Size: 2.24 MB

Statistics

Stars: 9
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 8

Topics

ai click compliance federated-learning flower gdpr hashicorp-vault jinja2 ml numpy opacus pandas privacy python pytorch sdv tensorflow weasyprint yaml

Created 11 months ago · Last pushed 8 months ago

Metadata Files

Readme License

SecureML is an open-source Python library that integrates with popular machine learning frameworks like TensorFlow and PyTorch. It provides developers with easy-to-use utilities to ensure that AI agents handle sensitive data in compliance with data protection regulations.

Key Features

Data Anonymization Utilities:
- K-anonymity implementation with adaptive generalization
- Pseudonymization with format-preserving encryption
- Configurable data masking with statistical property preservation
- Hierarchical data generalization with taxonomy support
- Automatic sensitive data detection
Privacy-Preserving Training Methods:
- Differential privacy integration with PyTorch (via Opacus) and TensorFlow (via TF Privacy)
- Federated learning with Flower, allowing training on distributed data without centralization
- Support for secure aggregation and privacy-preserving federated learning
Compliance Checkers: Tools to analyze datasets and model configurations for potential privacy risks
Synthetic Data Generation:
- Multiple generation methods including statistical modeling, GANs, and copulas
- SDV integration with Gaussian Copula, CTGAN, and TVAE synthesizers
- Automatic sensitive data detection and special handling
- Preservation of statistical properties and correlations between variables
- Support for mixed data types (numeric, categorical, datetime)
- Configurable privacy-utility tradeoff controls
- Tabular data synthesis with relation preservation
Regulation-Specific Presets:
- Pre-configured YAML settings aligned with major regulations (GDPR, CCPA, HIPAA, LGPD)
- Detailed compliance requirements for each regulation
- Customizable identifiers for personal data and sensitive information
- Integration with compliance checking functionality
Audit Trails and Reporting:
- Comprehensive audit logging of data access, transformations, and model operations
- Detailed event tracking for privacy-related operations with timestamps and contexts
- Function-level auditing through decorators
- Automated compliance reports in HTML and PDF formats
- Visual dashboards with charts showing privacy metrics and event distributions
- Integration with compliance checkers for continuous monitoring

Installation

With pip (Python 3.11-3.12): bash pip install secureml

Optional Dependencies

```bash

For generating PDF reports for compliance and audit trails

pip install secureml[pdf]

For secure key management with HashiCorp Vault

pip install secureml[vault]

For all optional components

pip install secureml[pdf,vault] ```

Quick Start

Data Anonymization

Anonymizing a dataset to comply with privacy regulations:

```python import pandas as pd from secureml import anonymize

Load your dataset

data = pd.DataFrame({ "name": ["John Doe", "Jane Smith", "Bob Johnson"], "age": [32, 45, 28], "email": ["john.doe@example.com", "jane.smith@example.com", "bob.j@example.com"], "ssn": ["123-45-6789", "987-65-4321", "456-78-9012"], "zip_code": ["10001", "94107", "60601"], "income": [75000, 82000, 65000] })

Anonymize using k-anonymity

anonymizeddata = anonymize( data, method="k-anonymity", k=2, sensitivecolumns=["name", "email", "ssn"] )

print(anonymized_data)

```

Compliance Checking with Regulation Presets

SecureML includes built-in presets for major regulations (GDPR, CCPA, HIPAA, LGPD) that define the compliance requirements specific to each regulation:

```python import pandas as pd from secureml import check_compliance

Load your dataset

data = pd.readcsv("yourdataset.csv")

Model configuration

modelconfig = { "modeltype": "neuralnetwork", "inputfeatures": ["age", "income", "zipcode"], "output": "purchaselikelihood", "trainingmethod": "standardbackprop" }

Check compliance with GDPR

report = checkcompliance(
data=data, modelconfig=model_config, regulation="GDPR" )

View compliance issues

if report.has_issues(): print("Compliance issues found:") for issue in report.issues: print(f"- {issue['component']}: {issue['issue']} ({issue['severity']})") print(f" Recommendation: {issue['recommendation']}")

```

Privacy-Preserving Machine Learning

Train a model with differential privacy guarantees:

```python import torch.nn as nn import pandas as pd from secureml import differentiallyprivatetrain

Create a simple PyTorch model

model = nn.Sequential( nn.Linear(10, 64), nn.ReLU(), nn.Linear(64, 2), nn.Softmax(dim=1) )

Load your dataset

data = pd.readcsv("yourdataset.csv")

Train with differential privacy

privatemodel = differentiallyprivatetrain( model=model, data=data, epsilon=1.0, # Privacy budget delta=1e-5, # Privacy delta parameter epochs=10, batchsize=64 ) ```

Synthetic Data Generation

Generate synthetic data that maintains the statistical properties of the original data:

```python import pandas as pd from secureml import generatesyntheticdata

Load your dataset

data = pd.readcsv("yourdataset.csv")

Generate synthetic data

syntheticdata = generatesyntheticdata( template=data, numsamples=1000, method="statistical", # Options: simple, statistical, sdv-copula, gan sensitive_columns=["name", "email", "ssn"] )

print(synthetic_data.head()) ```

Documentation

For detailed documentation, examples, and API reference, visit our documentation.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or Issue. Our focus is expanding supported legislations beyond GDPR, CCPA, HIPAA, and LGPD. You can help us with that!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Owner

Name: scimorph
Login: scimorph
Kind: organization

Repositories: 1
Profile: https://github.com/scimorph

You make groundbreaking AI, we take care of regulations. Created and maintained by @EnzoFanAccount

GitHub Events

Total

Release event: 4
Watch event: 6
Delete event: 7
Public event: 1
Push event: 54
Pull request event: 12
Create event: 6

Last Year

Release event: 4
Watch event: 6
Delete event: 7
Public event: 1
Push event: 54
Pull request event: 12
Create event: 6

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 10
Average time to close issues: N/A
Average time to close pull requests: 9 days
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.1
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 9

Past Year

Issues: 0
Pull requests: 10
Average time to close issues: N/A
Average time to close pull requests: 9 days
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.1
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 9

View more stats

Top Authors

Issue Authors

Pull Request Authors

dependabot[bot] (16)
EnzoFanAccount (2)

Top Labels

Issue Labels

Pull Request Labels

dependencies (16) python (16)

Packages

Total packages: 1
Total downloads:
- pypi 63 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 16
Total maintainers: 1

pypi.org: secureml

A Python library for privacy-preserving machine learning

Homepage: https://github.com/scimorph/secureml
Documentation: https://secureml.readthedocs.io
License: MIT
Latest release: 0.3.1
published 8 months ago

Versions: 16
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 63 Last month

Rankings

Dependent packages count: 9.4%

Average: 31.0%

Dependent repos count: 52.7%

Maintainers (1)

EnzoFanAccount

Last synced: 6 months ago

https://github.com/scimorph/secureml

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Documentation

Key Features

Installation

Optional Dependencies

For generating PDF reports for compliance and audit trails

For secure key management with HashiCorp Vault

For all optional components

Quick Start

Data Anonymization

Load your dataset

Anonymize using k-anonymity

Compliance Checking with Regulation Presets

Load your dataset

Model configuration

Check compliance with GDPR

View compliance issues

Privacy-Preserving Machine Learning

Create a simple PyTorch model

Load your dataset

Train with differential privacy

Synthetic Data Generation

Load your dataset

Generate synthetic data

Documentation

Contributing

License

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: secureml

Rankings

Maintainers (1)