cedskmeans

K-means clustering for differentially private power grid advanced metering infrastructure (AMI) synthetic customer data

https://github.com/lbnl-cybersecurity/cedskmeans

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization lbnl-cybersecurity has institutional domain (secpriv.lbl.gov)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

K-means clustering for differentially private power grid advanced metering infrastructure (AMI) synthetic customer data

Basic Info
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 3 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

CEDS DP K-Means Clustering

This repository contains the code for the CEDS Data Product K-Means Clustering. It follows the methodology proposed in papers [1] and [2].

[1] Ravi, Nikhil, et al. "Differentially Private-Means Clustering Applied to Meter Data Analysis and Synthesis." IEEE Transactions on Smart Grid 13.6 (2022): 4801-4814.
[2] Ravi, Nikhil, Anna Scaglione, and Sean Peisert. "Colored noise mechanism for differentially private clustering." arXiv preprint arXiv:2111.07850 (2021).

This software implements K-means clustering to power grid advanced metering infrastructure (AMI) data to obtain customer labels and centroids for a set of load time series by applying the framework of differential privacy. This is useful to generate differentially private synthetic load data consistent with the labeled data.

Installation

shell pip install git+https://github.com/lbnl-cybersecurity/CEDSKMeans.git

Centralized Usage

```python from cedskmeans import DPKMeans

Import the data

X = "Import data here in the form of a numpy ndarray"

Create a CEDSKMeans object

kmeans = DPKMeans( nclusters=6, epsilon=0.1, delta=1e-5, maxiter=1000 ) kmeans.fit(X)

Access the labels

labels = kmeans.labels_

Access the centroids

centroids = kmeans.clustercenters

Access the true labels

truelabels = kmeans.truelabels_

Access the true centroids

truecentroids = kmeans.trueclustercenters ```

Distributed Map Reduce Usage (requires ray)

```python from cedskmeans import runkmeanmap_reduce import ray

Import the data

X = "Import data here in the form of a pandas dataframe"

ray.init()

Create a CEDSKMeans object

kmeans = runkmeanmapreduce.remote( X=X, nclusters=3, nmappers=2, maxiter=1000, epsilon=0.1, delta=1e-5, ) kmeans = ray.get(kmeans)

Access the labels

labels = kmeans.labels_

Access the centroids

centroids = kmeans.clustercenters ```

Copyright Notice

CEDS Differential Privacy (CEDSDP) Copyright (c) 2023, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and Cornell University. All rights reserved.

If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Intellectual Property Office at IPO@lbl.gov.

NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.

Owner

  • Name: LBNL Cybersecurity R&D
  • Login: lbnl-cybersecurity
  • Kind: organization
  • Location: Berkeley, CA

Berkeley Lab does cybersecurity R&D for scientific computing and energy applications

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Ravi
    given-names: Nikhil
    // orcid: https://orcid.org/1234-5678-9101-1121
  - family-names: Vastola-Lunghino
    given-names: Brent
  - family-names: Scaglione
    given-name: Anna
  - family-names: Peisert
    given-names: Sean
title: "CEDS K-Means using Differentially Private K-Means Clustering Applied to Meter Data Analysis and Synthesis"
version: 0.1.0
doi: 10.1109/TSG.2022.3184252
date-released: 2022-06-17
url: "https://github.com/lbnl-cybersecurity/CEDSKMeans/tree/main"

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2

Dependencies

.github/workflows/deploy-docs.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
poetry.lock pypi
  • 133 dependencies
pyproject.toml pypi
  • fsspec ^2023.6.0
  • joblib ^1.2.0
  • numpy ^1.24.2
  • pandas ^2.0.1
  • pyspark ^3.4.0
  • python 3.10.9
  • pytz ^2023.3
  • ray ^2.5.1
  • scikit-learn ^1.2.2
  • scipy ^1.10.1
  • sympy ^1.11.1
  • tabulate ^0.9.0
  • tensorboard ^2.13.0
  • tensorboardx ^2.6.1
  • tqdm ^4.65.0