https://github.com/scimorph/secureml
Easy-to-use utilities to build privacy-preserving AI.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.1%) to scientific vocabulary
Keywords
Repository
Easy-to-use utilities to build privacy-preserving AI.
Basic Info
- Host: GitHub
- Owner: scimorph
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://secureml.readthedocs.io
- Size: 2.24 MB
Statistics
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 8
Topics
Metadata Files
README.md
Documentation
SecureML is an open-source Python library that integrates with popular machine learning frameworks like TensorFlow and PyTorch. It provides developers with easy-to-use utilities to ensure that AI agents handle sensitive data in compliance with data protection regulations.
Key Features
- Data Anonymization Utilities:
- K-anonymity implementation with adaptive generalization
- Pseudonymization with format-preserving encryption
- Configurable data masking with statistical property preservation
- Hierarchical data generalization with taxonomy support
- Automatic sensitive data detection
- Privacy-Preserving Training Methods:
- Differential privacy integration with PyTorch (via Opacus) and TensorFlow (via TF Privacy)
- Federated learning with Flower, allowing training on distributed data without centralization
- Support for secure aggregation and privacy-preserving federated learning
- Compliance Checkers: Tools to analyze datasets and model configurations for potential privacy risks
- Synthetic Data Generation:
- Multiple generation methods including statistical modeling, GANs, and copulas
- SDV integration with Gaussian Copula, CTGAN, and TVAE synthesizers
- Automatic sensitive data detection and special handling
- Preservation of statistical properties and correlations between variables
- Support for mixed data types (numeric, categorical, datetime)
- Configurable privacy-utility tradeoff controls
- Tabular data synthesis with relation preservation
- Regulation-Specific Presets:
- Pre-configured YAML settings aligned with major regulations (GDPR, CCPA, HIPAA, LGPD)
- Detailed compliance requirements for each regulation
- Customizable identifiers for personal data and sensitive information
- Integration with compliance checking functionality
- Audit Trails and Reporting:
- Comprehensive audit logging of data access, transformations, and model operations
- Detailed event tracking for privacy-related operations with timestamps and contexts
- Function-level auditing through decorators
- Automated compliance reports in HTML and PDF formats
- Visual dashboards with charts showing privacy metrics and event distributions
- Integration with compliance checkers for continuous monitoring
Installation
With pip (Python 3.11-3.12):
bash
pip install secureml
Optional Dependencies
```bash
For generating PDF reports for compliance and audit trails
pip install secureml[pdf]
For secure key management with HashiCorp Vault
pip install secureml[vault]
For all optional components
pip install secureml[pdf,vault] ```
Quick Start
Data Anonymization
Anonymizing a dataset to comply with privacy regulations:
```python import pandas as pd from secureml import anonymize
Load your dataset
data = pd.DataFrame({ "name": ["John Doe", "Jane Smith", "Bob Johnson"], "age": [32, 45, 28], "email": ["john.doe@example.com", "jane.smith@example.com", "bob.j@example.com"], "ssn": ["123-45-6789", "987-65-4321", "456-78-9012"], "zip_code": ["10001", "94107", "60601"], "income": [75000, 82000, 65000] })
Anonymize using k-anonymity
anonymizeddata = anonymize( data, method="k-anonymity", k=2, sensitivecolumns=["name", "email", "ssn"] )
print(anonymized_data)
```
Compliance Checking with Regulation Presets
SecureML includes built-in presets for major regulations (GDPR, CCPA, HIPAA, LGPD) that define the compliance requirements specific to each regulation:
```python import pandas as pd from secureml import check_compliance
Load your dataset
data = pd.readcsv("yourdataset.csv")
Model configuration
modelconfig = { "modeltype": "neuralnetwork", "inputfeatures": ["age", "income", "zipcode"], "output": "purchaselikelihood", "trainingmethod": "standardbackprop" }
Check compliance with GDPR
report = checkcompliance(
data=data,
modelconfig=model_config,
regulation="GDPR"
)
View compliance issues
if report.has_issues(): print("Compliance issues found:") for issue in report.issues: print(f"- {issue['component']}: {issue['issue']} ({issue['severity']})") print(f" Recommendation: {issue['recommendation']}")
```
Privacy-Preserving Machine Learning
Train a model with differential privacy guarantees:
```python import torch.nn as nn import pandas as pd from secureml import differentiallyprivatetrain
Create a simple PyTorch model
model = nn.Sequential( nn.Linear(10, 64), nn.ReLU(), nn.Linear(64, 2), nn.Softmax(dim=1) )
Load your dataset
data = pd.readcsv("yourdataset.csv")
Train with differential privacy
privatemodel = differentiallyprivatetrain( model=model, data=data, epsilon=1.0, # Privacy budget delta=1e-5, # Privacy delta parameter epochs=10, batchsize=64 ) ```
Synthetic Data Generation
Generate synthetic data that maintains the statistical properties of the original data:
```python import pandas as pd from secureml import generatesyntheticdata
Load your dataset
data = pd.readcsv("yourdataset.csv")
Generate synthetic data
syntheticdata = generatesyntheticdata( template=data, numsamples=1000, method="statistical", # Options: simple, statistical, sdv-copula, gan sensitive_columns=["name", "email", "ssn"] )
print(synthetic_data.head()) ```
Documentation
For detailed documentation, examples, and API reference, visit our documentation.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request or Issue. Our focus is expanding supported legislations beyond GDPR, CCPA, HIPAA, and LGPD. You can help us with that!
License
This project is licensed under the MIT License - see the LICENSE file for details.
Owner
- Name: scimorph
- Login: scimorph
- Kind: organization
- Repositories: 1
- Profile: https://github.com/scimorph
You make groundbreaking AI, we take care of regulations. Created and maintained by @EnzoFanAccount
GitHub Events
Total
- Release event: 4
- Watch event: 6
- Delete event: 7
- Public event: 1
- Push event: 54
- Pull request event: 12
- Create event: 6
Last Year
- Release event: 4
- Watch event: 6
- Delete event: 7
- Public event: 1
- Push event: 54
- Pull request event: 12
- Create event: 6
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 10
- Average time to close issues: N/A
- Average time to close pull requests: 9 days
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.1
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 9
Past Year
- Issues: 0
- Pull requests: 10
- Average time to close issues: N/A
- Average time to close pull requests: 9 days
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.1
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 9
Top Authors
Issue Authors
Pull Request Authors
- dependabot[bot] (16)
- EnzoFanAccount (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 63 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 16
- Total maintainers: 1
pypi.org: secureml
A Python library for privacy-preserving machine learning
- Homepage: https://github.com/scimorph/secureml
- Documentation: https://secureml.readthedocs.io
- License: MIT
-
Latest release: 0.3.1
published 8 months ago