cedskmeans
K-means clustering for differentially private power grid advanced metering infrastructure (AMI) synthetic customer data
Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization lbnl-cybersecurity has institutional domain (secpriv.lbl.gov) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Repository
K-means clustering for differentially private power grid advanced metering infrastructure (AMI) synthetic customer data
Basic Info
- Host: GitHub
- Owner: lbnl-cybersecurity
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://secpriv.lbl.gov/project/ceds-privacy/
- Size: 863 KB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
CEDS DP K-Means Clustering
This repository contains the code for the CEDS Data Product K-Means Clustering. It follows the methodology proposed in papers [1] and [2].
[1] Ravi, Nikhil, et al. "Differentially Private-Means Clustering Applied to Meter Data Analysis and Synthesis." IEEE Transactions on Smart Grid 13.6 (2022): 4801-4814.
[2] Ravi, Nikhil, Anna Scaglione, and Sean Peisert. "Colored noise mechanism for differentially private clustering." arXiv preprint arXiv:2111.07850 (2021).
This software implements K-means clustering to power grid advanced metering infrastructure (AMI) data to obtain customer labels and centroids for a set of load time series by applying the framework of differential privacy. This is useful to generate differentially private synthetic load data consistent with the labeled data.
Installation
shell
pip install git+https://github.com/lbnl-cybersecurity/CEDSKMeans.git
Centralized Usage
```python from cedskmeans import DPKMeans
Import the data
X = "Import data here in the form of a numpy ndarray"
Create a CEDSKMeans object
kmeans = DPKMeans( nclusters=6, epsilon=0.1, delta=1e-5, maxiter=1000 ) kmeans.fit(X)
Access the labels
labels = kmeans.labels_
Access the centroids
centroids = kmeans.clustercenters
Access the true labels
truelabels = kmeans.truelabels_
Access the true centroids
truecentroids = kmeans.trueclustercenters ```
Distributed Map Reduce Usage (requires ray)
```python from cedskmeans import runkmeanmap_reduce import ray
Import the data
X = "Import data here in the form of a pandas dataframe"
ray.init()
Create a CEDSKMeans object
kmeans = runkmeanmapreduce.remote( X=X, nclusters=3, nmappers=2, maxiter=1000, epsilon=0.1, delta=1e-5, ) kmeans = ray.get(kmeans)
Access the labels
labels = kmeans.labels_
Access the centroids
centroids = kmeans.clustercenters ```
Copyright Notice
CEDS Differential Privacy (CEDSDP) Copyright (c) 2023, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and Cornell University. All rights reserved.
If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Intellectual Property Office at IPO@lbl.gov.
NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.
Owner
- Name: LBNL Cybersecurity R&D
- Login: lbnl-cybersecurity
- Kind: organization
- Location: Berkeley, CA
- Website: https://secpriv.lbl.gov
- Twitter: speisert
- Repositories: 5
- Profile: https://github.com/lbnl-cybersecurity
Berkeley Lab does cybersecurity R&D for scientific computing and energy applications
Citation (citation.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Ravi
given-names: Nikhil
// orcid: https://orcid.org/1234-5678-9101-1121
- family-names: Vastola-Lunghino
given-names: Brent
- family-names: Scaglione
given-name: Anna
- family-names: Peisert
given-names: Sean
title: "CEDS K-Means using Differentially Private K-Means Clustering Applied to Meter Data Analysis and Synthesis"
version: 0.1.0
doi: 10.1109/TSG.2022.3184252
date-released: 2022-06-17
url: "https://github.com/lbnl-cybersecurity/CEDSKMeans/tree/main"
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- 133 dependencies
- fsspec ^2023.6.0
- joblib ^1.2.0
- numpy ^1.24.2
- pandas ^2.0.1
- pyspark ^3.4.0
- python 3.10.9
- pytz ^2023.3
- ray ^2.5.1
- scikit-learn ^1.2.2
- scipy ^1.10.1
- sympy ^1.11.1
- tabulate ^0.9.0
- tensorboard ^2.13.0
- tensorboardx ^2.6.1
- tqdm ^4.65.0