https://github.com/scimorph/secureml

Easy-to-use utilities to build privacy-preserving AI.

https://github.com/scimorph/secureml

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.1%) to scientific vocabulary

Keywords

ai click compliance federated-learning flower gdpr hashicorp-vault jinja2 ml numpy opacus pandas privacy python pytorch sdv tensorflow weasyprint yaml
Last synced: 5 months ago · JSON representation

Repository

Easy-to-use utilities to build privacy-preserving AI.

Basic Info
Statistics
  • Stars: 9
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 8
Topics
ai click compliance federated-learning flower gdpr hashicorp-vault jinja2 ml numpy opacus pandas privacy python pytorch sdv tensorflow weasyprint yaml
Created 11 months ago · Last pushed 8 months ago
Metadata Files
Readme License

README.md

SecureML Logo

CI/CD Status Tests Status PyPI Version License Python Versions

Documentation

SecureML is an open-source Python library that integrates with popular machine learning frameworks like TensorFlow and PyTorch. It provides developers with easy-to-use utilities to ensure that AI agents handle sensitive data in compliance with data protection regulations.

Key Features

  • Data Anonymization Utilities:
    • K-anonymity implementation with adaptive generalization
    • Pseudonymization with format-preserving encryption
    • Configurable data masking with statistical property preservation
    • Hierarchical data generalization with taxonomy support
    • Automatic sensitive data detection
  • Privacy-Preserving Training Methods:
    • Differential privacy integration with PyTorch (via Opacus) and TensorFlow (via TF Privacy)
    • Federated learning with Flower, allowing training on distributed data without centralization
    • Support for secure aggregation and privacy-preserving federated learning
  • Compliance Checkers: Tools to analyze datasets and model configurations for potential privacy risks
  • Synthetic Data Generation:
    • Multiple generation methods including statistical modeling, GANs, and copulas
    • SDV integration with Gaussian Copula, CTGAN, and TVAE synthesizers
    • Automatic sensitive data detection and special handling
    • Preservation of statistical properties and correlations between variables
    • Support for mixed data types (numeric, categorical, datetime)
    • Configurable privacy-utility tradeoff controls
    • Tabular data synthesis with relation preservation
  • Regulation-Specific Presets:
    • Pre-configured YAML settings aligned with major regulations (GDPR, CCPA, HIPAA, LGPD)
    • Detailed compliance requirements for each regulation
    • Customizable identifiers for personal data and sensitive information
    • Integration with compliance checking functionality
  • Audit Trails and Reporting:
    • Comprehensive audit logging of data access, transformations, and model operations
    • Detailed event tracking for privacy-related operations with timestamps and contexts
    • Function-level auditing through decorators
    • Automated compliance reports in HTML and PDF formats
    • Visual dashboards with charts showing privacy metrics and event distributions
    • Integration with compliance checkers for continuous monitoring

Installation

With pip (Python 3.11-3.12): bash pip install secureml

Optional Dependencies

```bash

For generating PDF reports for compliance and audit trails

pip install secureml[pdf]

For secure key management with HashiCorp Vault

pip install secureml[vault]

For all optional components

pip install secureml[pdf,vault] ```

Quick Start

Data Anonymization

Anonymizing a dataset to comply with privacy regulations:

```python import pandas as pd from secureml import anonymize

Load your dataset

data = pd.DataFrame({ "name": ["John Doe", "Jane Smith", "Bob Johnson"], "age": [32, 45, 28], "email": ["john.doe@example.com", "jane.smith@example.com", "bob.j@example.com"], "ssn": ["123-45-6789", "987-65-4321", "456-78-9012"], "zip_code": ["10001", "94107", "60601"], "income": [75000, 82000, 65000] })

Anonymize using k-anonymity

anonymizeddata = anonymize( data, method="k-anonymity", k=2, sensitivecolumns=["name", "email", "ssn"] )

print(anonymized_data)

```

Compliance Checking with Regulation Presets

SecureML includes built-in presets for major regulations (GDPR, CCPA, HIPAA, LGPD) that define the compliance requirements specific to each regulation:

```python import pandas as pd from secureml import check_compliance

Load your dataset

data = pd.readcsv("yourdataset.csv")

Model configuration

modelconfig = { "modeltype": "neuralnetwork", "inputfeatures": ["age", "income", "zipcode"], "output": "purchaselikelihood", "trainingmethod": "standardbackprop" }

Check compliance with GDPR

report = checkcompliance(
data=data, model
config=model_config, regulation="GDPR" )

View compliance issues

if report.has_issues(): print("Compliance issues found:") for issue in report.issues: print(f"- {issue['component']}: {issue['issue']} ({issue['severity']})") print(f" Recommendation: {issue['recommendation']}")

```

Privacy-Preserving Machine Learning

Train a model with differential privacy guarantees:

```python import torch.nn as nn import pandas as pd from secureml import differentiallyprivatetrain

Create a simple PyTorch model

model = nn.Sequential( nn.Linear(10, 64), nn.ReLU(), nn.Linear(64, 2), nn.Softmax(dim=1) )

Load your dataset

data = pd.readcsv("yourdataset.csv")

Train with differential privacy

privatemodel = differentiallyprivatetrain( model=model, data=data, epsilon=1.0, # Privacy budget delta=1e-5, # Privacy delta parameter epochs=10, batchsize=64 ) ```

Synthetic Data Generation

Generate synthetic data that maintains the statistical properties of the original data:

```python import pandas as pd from secureml import generatesyntheticdata

Load your dataset

data = pd.readcsv("yourdataset.csv")

Generate synthetic data

syntheticdata = generatesyntheticdata( template=data, numsamples=1000, method="statistical", # Options: simple, statistical, sdv-copula, gan sensitive_columns=["name", "email", "ssn"] )

print(synthetic_data.head()) ```

Documentation

For detailed documentation, examples, and API reference, visit our documentation.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or Issue. Our focus is expanding supported legislations beyond GDPR, CCPA, HIPAA, and LGPD. You can help us with that!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Owner

  • Name: scimorph
  • Login: scimorph
  • Kind: organization

You make groundbreaking AI, we take care of regulations. Created and maintained by @EnzoFanAccount

GitHub Events

Total
  • Release event: 4
  • Watch event: 6
  • Delete event: 7
  • Public event: 1
  • Push event: 54
  • Pull request event: 12
  • Create event: 6
Last Year
  • Release event: 4
  • Watch event: 6
  • Delete event: 7
  • Public event: 1
  • Push event: 54
  • Pull request event: 12
  • Create event: 6

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 10
  • Average time to close issues: N/A
  • Average time to close pull requests: 9 days
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.1
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 9
Past Year
  • Issues: 0
  • Pull requests: 10
  • Average time to close issues: N/A
  • Average time to close pull requests: 9 days
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.1
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 9
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (16)
  • EnzoFanAccount (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (16) python (16)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 63 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 16
  • Total maintainers: 1
pypi.org: secureml

A Python library for privacy-preserving machine learning

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 63 Last month
Rankings
Dependent packages count: 9.4%
Average: 31.0%
Dependent repos count: 52.7%
Maintainers (1)
Last synced: 6 months ago