io.github.andrewquijano:level-site-ppdt

Enhanced Outsourced and Secure Inference for Tall Sparse Decision Trees

https://github.com/adwise-fiu/level-site-ppdt

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.7%) to scientific vocabulary

Keywords

decision-trees java privacy-preserving-data-mining privacy-preserving-machine-learning socialist-millionaires weka

Last synced: 6 months ago · JSON representation ·

Repository

Enhanced Outsourced and Secure Inference for Tall Sparse Decision Trees

Basic Info

Host: GitHub
Owner: adwise-fiu
License: mit
Language: Java
Default Branch: main
Homepage:
Size: 14.4 MB

Statistics

Stars: 0
Watchers: 2
Forks: 2
Open Issues: 2
Releases: 3

Topics

decision-trees java privacy-preserving-data-mining privacy-preserving-machine-learning socialist-millionaires weka

Created about 3 years ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

Level-Site-PPDT

Implementation of the PPDT in the paper "Evaluating Outsourced Decision Trees by a Level-Based Approach"

Installation

It is a requirement to install SDK to install Gradle. You need to install the following packages, to ensure everything works as expected ```bash sudo apt-get install -y default-jdk, default-jre, graphviz, curl, python3-pip pip3 install pyyaml pip3 install configobj curl -s "https://get.sdkman.io" | bash source "$HOME/.sdkman/bin/sdkman-init.sh"

In a new terminal, you run this command

sdk install gradle ```

Run this command and all future commands from Level-Site-PPDT folder, run the following command once to install docker and MiniKube.

Reboot your machine, then re-run the command to install minikube. bash bash setup.sh

Also, remember to install Sealed Secrets. ```bash sudo apt-get install jq

Fetch the latest sealed-secrets version using GitHub API

KUBESEAL_VERSION=$(curl -s https://api.github.com/repos/bitnami-labs/sealed-secrets/tags | jq -r '.[0].name' | cut -c 2-)

Check if the version was fetched successfully

if [ -z "$KUBESEALVERSION" ]; then echo "Failed to fetch the latest KUBESEALVERSION" exit 1 fi

wget "https://github.com/bitnami-labs/sealed-secrets/releases/download/v${KUBESEALVERSION}/kubeseal-${KUBESEALVERSION}-linux-amd64.tar.gz" tar -xvzf kubeseal-"${KUBESEAL_VERSION}"-linux-amd64.tar.gz kubeseal sudo install -m 755 kubeseal /usr/local/bin/kubeseal rm kubeseal*

Install Helm

curl -fsSL -o gethelm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 chmod 700 gethelm.sh ./gethelm.sh rm ./gethelm.sh

Add Sealed Secret Cluster

helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets helm install sealed-secrets -n kube-system --set-string fullnameOverride=sealed-secrets-controller sealed-secrets/sealed-secrets ```

Before you run the PPDT, make sure to create your keystore, this is necessary as the level-sites use TLS sockets. Either run create_keystore.sh script, make sure the password is consistent with the Kubernetes secret, or just use the Sealed Secret.

Running PPDT locally

Check the config.properties file is set to your needs. Currently:
1. It assumes level-site 0 would use port 9000, level-site 1 would use port 9001, etc.
  1. If you modify this, provide a comma-separated string of all the ports for each level-site.
  2. Currently, it assumes ports 9000–9009 will be used.
2. key_size corresponds to the key size of both DGK and Paillier keys.
3. precision controls how accurate to measure thresholds that are decimals. If a value was 100.1, then a precision of 1 would set this value to 1001.
4. The data would point to the directory with the answer.csv file and all the training and testing data.
Currently, the test file will read from the data/answers.csv file.
1. The first column is the training data set, it is required to be a .arff file to be compatible with Weka. Alternatively, you can pass a .model file, which is a pre-trained Weka model. It is assumed this is a J48 classifier tree model.
2. The second column would the name of an input file that is tab separated with the feature name and value
3. The third column would be the expected classification given the input from the second column. If there is a mismatch, there will be an assertion error.

To run the end-to-end test, run the following: bash sh gradlew build

When the testing is done, you will have an output directory containing both the DT model and a text file on how to draw your tree. Input the contents of the text file into the website here to get a drawing of what the DT looks like.

If you want to analyze the level of each classification in a pre-trained decision tree from the data folder, run the following (where argument is the name of the dataset): bash ./gradlew run -PchooseRole=weka.finito.utils.depth_analysis --args spambase This will read the DT in data/spambase.model which was trained from the data set data/spambase.arff. It will classify all the data in the training set, and get the level (1, ..., d) of the classification within the DT model. In the paper, I used this to argue that assuming most training data is like testing data, you likely will never need to go down the whole tree often.

Running PPDT on Kubernetes clusters

To make it easier for deploying on the cloud, we also provided a method to export our system into Kubernetes. This would assume one execution rather than multiple executions.

Option 1 - Using Minikube

You will need to start and configure minikube. When writing the paper, we provided 8 CPUs and 20 GB of memory; this was set using the arguments that fit your computer's specs.

minikube start --cpus 8 --memory 20000
eval $(minikube docker-env)

Option 2- Running it on an EKS Cluster

First install eksctl
Create a user with sufficient permissions. Go to IAM, Select Users, Create User, Attach Policies directly, for a quick experiment select all permission.
Obtain AWSACCESSKEYID and AWSSECRETACCESSKEY of the user account. See the documentation provided here
run aws configure to input the access id and credential.
Run the following command to create the cluster bash eksctl create cluster --config-file eks-config/single-cluster.yaml
Confirm the EKS cluster exists using the following bash eksctl get clusters --region us-east-1
Once you confirm the cluster is created, you need to register the cluster with kubectl: bash aws eks update-kubeconfig --name ppdt --region us-east-1

Using/Creating a Kubernetes Sealed Secret

It is suggested you use the existing sealed secret. The password in this secret is aligned with what is on the keystore.

commandline kubectl apply -f ppdt-sealedsecret.yaml

Alternatively, you can create a new sealed secret as follows: bash kubectl create secret generic ppdt-secrets --from-literal=keystore-pass=<SECRET_VALUE> kubectl get secret ppdt-secrets -o yaml | kubeseal --scope cluster-wide > ppdt-sealedsecret.yaml However, if you make a new sealed secret, you should re-make the keystore as well. Just remember, sealed secrets do not work in multiple clusters by default, as a heads-up.

Running Kubernetes Commands

The next step is to start deploying all the components running the following:

kubectl apply -f k8/server
kubectl apply -f k8/level_sites
kubectl apply -f k8/client

You will then need to wait until all the level sites are launched. To verify this, please run the following command. All the pods that say levelsite should have a status _running.

kubectl get pods

The output of kubectl get pods would look something like: NAME READY STATUS RESTARTS AGE ppdt-level-site-01-deploy-7dbf5b4cdd-wz6q7 1/1 Running 1 (2m39s ago) 16h ppdt-level-site-02-deploy-69bb8fd5c6-wjjbs 1/1 Running 1 (2m39s ago) 16h ppdt-level-site-03-deploy-74f7d95768-r6tn8 1/1 Running 1 (16h ago) 16h ppdt-level-site-04-deploy-6d99df8d7b-d6qlj 1/1 Running 1 (2m39s ago) 16h ppdt-level-site-05-deploy-855b649896-82hlm 1/1 Running 1 (2m39s ago) 16h ppdt-level-site-06-deploy-6578fc8c9b-ntzhn 1/1 Running 1 (16h ago) 16h ppdt-level-site-07-deploy-6f57496cdd-hlggh 1/1 Running 1 (16h ago) 16h ppdt-level-site-08-deploy-6d596967b8-mh9hz 1/1 Running 1 (2m39s ago) 16h ppdt-level-site-09-deploy-8555c56976-752pn 1/1 Running 1 (16h ago) 16h ppdt-level-site-10-deploy-67b7c5689b-rkl6r 1/1 Running 1 (2m39s ago) 16h It does take time for the level-site to be able to accept connections. Run the following command on the first level-site, and wait for an output in standard output saying LEVEL SITE SERVER STARTED!. Use CTRL+C to exit the pod.

kubectl logs -f $(kubectl get pod -l "pod=ppdt-level-site-01-deploy" -o name)
kubectl logs -f $(kubectl get pod -l "pod=ppdt-level-site-10-deploy" -o name)

Next, you need to run the server to create Decision Tree and split the model among the level-sites. You can run it either connecting via a terminal to the pod using the commands below.

kubectl exec -i -t $(kubectl get pod -l "pod=ppdt-server-deploy" -o name) -- /bin/bash
gradle run -PchooseRole=weka.finito.server --args <TRAINING-FILE>

Alternatively, you can combine the above commands as follows:

kubectl exec -i -t $(kubectl get pod -l "pod=ppdt-server-deploy" -o name) -- bash -c "gradle run -PchooseRole=weka.finito.server --args <TRAINING-FILE>"

Once you see this output Server ready to get public keys from client-site, you need to run the client.

In a NEW terminal, start the client, run the following commands to complete an evaluation. You would point values to something like /data/hypothyroid.values.

kubectl exec -i -t $(kubectl get pod -l "pod=ppdt-client-deploy" -o name) -- /bin/bash
gradle run -PchooseRole=weka.finito.client --args <VALUES-FILE>

# Test WITHOUT level-sites
gradle run -PchooseRole=weka.finito.client --args '<VALUES-FILE> --server'

Alternatively, you can combine both commands in one go as follows:

kubectl exec -i -t $(kubectl get pod -l "pod=ppdt-client-deploy" -o name) -- bash -c "gradle run -PchooseRole=weka.finito.client --args <VALUES-FILE>"

# Test WITHOUT level-sites
kubectl exec -i -t $(kubectl get pod -l "pod=ppdt-client-deploy" -o name) -- bash -c "gradle run -PchooseRole=weka.finito.client --args '<VALUES-FILE> --server'"

Re-running with different experiments

If you are just re-running the client with the same or different values file, just re-run the above command again. However, if you want to test with another data set, best to just rebuild the environment by deleting everything first.

bash kubectl delete -f k8/client kubectl delete -f k8/server kubectl delete -f k8/level_sites

Then repeat the instructions on the previous section.

Clean up

Destroy the EKS cluster using the following: bash eksctl delete cluster --config-file eks-config/single-cluster.yaml --wait

Destroy the MiniKube environment as follows: bash minikube delete

Authors and Acknowledgement

Code Authors: Andrew Quijano, Spyros T. Halkidis, Kevin Gallagher

Kevin Gallagher is supported by NOVA LINCS ref. UIDB/04516/2020 and ref. UIDP/04516/2020 with the financial support of FCT.IP.

License

MIT

Project status

The project is fully tested.

Current Issues

Not sure why the encryption library seems to have a bug in some specific comparisons in spambase and hypothyroid. I will debug these soon, but overall this works like a charm.
TLS Sockets do not work on EKS, but I will fix this eventually. It works on all connections except once level-site 1 reaches out to the client for evaluation.
Much bigger issue, so the first few runs of this application on EKS, the comparisons are pretty fast, like it takes about 0.5 seconds. But after like 10+ comparisons, the comparison performance just drops off a cliff to like 1 second. The only way I see to restore the same level of performance is to rebuild the EKS cluster. I have NO idea why this performance drop occurs, and I have tried deleting and rebuilding the pods, and even restarting the EC2 instances.

Owner

Name: Advanced Wireless and Security Lab
Login: adwise-fiu
Kind: organization
Location: United States of America

Repositories: 3
Profile: https://github.com/adwise-fiu

ADWISE laboratory at Florida International University - Department of Electrical and Computer Engineering.

Citation (CITATION.cff)

cff-version: 1.0.0
message: "If you use this software, please cite the paper."
authors:
- family-names: "Quijano"
  given-names: "Andrew"
  orcid: "https://orcid.org/0000-0002-6673-4934"
- family-names: "Halkidis"
  given-names: "Spyros T."
  orcid: "https://orcid.org/0000-0001-9983-1012"
- family-names: "Gallagher"
  given-names: "Kevin"
  orcid: "https://orcid.org/0000-0002-2714-7841"
- family-names: "Akkaya"
  given-names: "Kemal"
  orcid: "https://orcid.org/0000-0002-7103-4545"
- family-names: "Samaras"
  given-names: "Nikolaos"
  orcid: "https://orcid.org/0000-0001-8201-7081"
title: "Enhanced Outsourced and Secure Inference for Tall Sparse Decision Trees"
version: 2.0.0
doi: TBD
date-released: 2024-03-23
url: "https://github.com/AndrewQuijano/Level-Site-PPDT"
preferred-citation:
  type: conference-paper
  authors:
  - family-names: "Quijano"
    given-names: "Andrew"
    orcid: "https://orcid.org/0000-0002-6673-4934"
  - family-names: "Halkidis"
    given-names: "Spyros T."
    orcid: "https://orcid.org/0000-0001-9983-1012"
  - family-names: "Gallagher"
    given-names: "Kevin"
    orcid: "https://orcid.org/0000-0002-2714-7841"
  - family-names: "Akkaya"
    given-names: "Kemal"
    orcid: "https://orcid.org/0000-0002-7103-4545"
  - family-names: "Samaras"
    given-names: "Nikolaos"
    orcid: "https://orcid.org/0000-0001-8201-7081"
  journal: "ESORICS 2024"
  month: 09
  start: 1 # First page number
  end: 19 # Last page number
  title: "Enhanced Outsourced and Secure Inference for Tall Sparse Decision Trees"
  issue: 1
  volume: 1
  year: 2024

GitHub Events

Total

Release event: 3
Delete event: 2
Issue comment event: 6
Push event: 22
Pull request review event: 5
Pull request event: 18
Fork event: 2
Create event: 6

Last Year

Release event: 3
Delete event: 2
Issue comment event: 6
Push event: 22
Pull request review event: 5
Pull request event: 18
Fork event: 2
Create event: 6

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 8
Average time to close issues: N/A
Average time to close pull requests: 14 minutes
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.25
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 8
Average time to close issues: N/A
Average time to close pull requests: 14 minutes
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.25
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

msthilaire5 (6)
AndrewQuijano (3)

Top Labels

Issue Labels

Pull Request Labels

documentation (5)

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1

repo1.maven.org: io.github.andrewquijano:level-site-ppdt

This JAR file is used for implementing the Level-Site Privacy-Preserving Decision Trees (PPDT). See the paper "Enhanced Outsourced and Secure Inference for Tall Sparse Decision Trees" which describes the implementation and performance gains this approach has in avoiding the conversion to a complete binary tree as in the Joye and Salehi paper.

Homepage: https://github.com/adwise-fiu/Level-Site-PPDT
Documentation: https://appdoc.app/artifact/io.github.andrewquijano/level-site-ppdt/
License: MIT License
Latest release: 1.0.1
published 6 months ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 32.4%

Average: 39.4%

Dependent packages count: 46.3%