p3forecast

a Personalized Privacy Preserving cloud workload prediction framework based on Federated Generative Adversarial Networks (GANs), which allows cloud providers with Non-IID workload data to collaboratively train workload prediction models as preferred while protecting privacy.

https://github.com/liyan2015/p3forecast

Science Score: 41.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: ieee.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Keywords

data-augmentation federated-learning gan privacy-protection

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: liyan2015
Language: Python
Default Branch: main
Homepage:
Size: 3.11 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

data-augmentation federated-learning gan privacy-protection

Created 11 months ago · Last pushed 11 months ago

Metadata Files

Readme Citation

README.md

$P^3Forecast$

This repository provides the implementation of the paper "P3Forecast: Personalized Privacy-Preserving Cloud Workload Prediction based on Federated Generative Adversarial Networks", which is published in the Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2025). In this paper, we propose $P^{3}Forecast$, a $\underline{\textbf{P}}ersonalized$ $\underline{\textbf{P}}rivacy-\underline{\textbf{P}}reserving$ cloud workload prediction framework based on Federated Generative Adversarial Networks (GANs), which allows cloud providers with Non-IID workload data to collaboratively train workload prediction models as preferred while protecting privacy. Compared with the state-of-the-art, our framwork improves workload prediction accuracy by 19.5\%-46.7\% in average over all cloud providers, while ensuring the fastest convergence in Federated GAN training.


Testing accuracy on different cloud providers with uniform $\mu$ (the constant controlling the impact of training dataset size on learning rate)	Testing accuracy on different cloud providers with the optimal $\mu$	Convergence performance of Federated GAN

$P^{3}Forecast$ consists of the following three key components:

1. Workload Data Synthesis Quality Assessment

The code in the file lib/similarity.py is for determining the similarity between the synthesized workload data and the original workload data. Our proposed $pattern-aware DTW$, a novel Dynamic Time Warping (DTW)-based data synthesis quality assessment method, is implemented in the function fastpdtw.

2. Federated GAN Training based on Data Synthesis Quality

The code in the file lib/fedgan_training.py is for federated GAN training.

3. Post-Training of Local Workload Prediction Models

The code in the file lib/post_training.py is for post-training of local workload prediction models.

Run

main.py is the main function.

The code can be executed with run.sh using the parameters as follows:

```bash (pytorch) user@host:~/P3Forecast$ bash run.sh -h Usage: bash run.sh [run_times] [args]

Parameters required to run the program optional arguments: -h, --help show this help message and exit -g GPU, --gpu GPU gpu device, if -1, use cpu; if None, use all gpu -c COLUMNS, --columns COLUMNS dataset columns -n NOTE, --note NOTE note of this run -s SEED, --seed SEED customize seed -id ID, --id ID choose gan models with id -cs CLOUDS, --clouds CLOUDS cloud indexs 0-6(Alibaba,Azure,Google,Alibaba-AI,HPC-KS,HPC-HF,HPC-WZ), such as 0,1,2,3,4,5,6 -p PROBABILITY, --probability PROBABILITY the probability of cloud selected -pd, --preprocessdataset with True preprocess dataset, without False directly use the preprocessed dataset -rf, --refresh with True refresh historical output data, without False -ghs GANHIDDENSIZE, --ganhiddensize GANHIDDENSIZE hidden size of timegan -gln GANLAYERNUM, --ganlayernum GANLAYERNUM layer size of timegan -glr GANLEARNINGRATE, --ganlearningrate GANLEARNINGRATE learning rate of timegan -gle GANLOCALEPOCHS, --ganlocalepochs GANLOCALEPOCHS local train epochs of Federated GAN -cr ROUNDS, --rounds ROUNDS communication rounds of Federated GAN -gnt, --gannottrain with False not train, without True -gip, --ganispre with True gan pre, without False -w {pdtw,dtw,datasize,avg}, --weight {pdtw,dtw,datasize,avg} aggreating weight -m {GRU,TCN,LSTM}, --model {GRU,TCN,LSTM} predictor: GRU or TCN or LSTM -l SEQLEN, --seqlen SEQLEN sequence length -b BATCHSIZE, --batchsize BATCHSIZE batch size of predictor -hd HIDDENSIZE, --hiddensize HIDDENSIZE hidden size of predictor -ln LAYERNUM, --layernum LAYERNUM layer size of predictor -e EPOCHS, --epochs EPOCHS total train epochs of predictor -lep LOCALEPOCHSPOST, --localepochspost LOCALEPOCHSPOST local epochs of post training -lr LEARNINGRATE, --learningrate LEARNINGRATE learning rate of predictor -lrs {fixed,adaptive}, --learningratestrategy {fixed,adaptive} learning rate strategy of post training -mc {rmse,smape}, --metric {rmse,smape} metric of test accuracy of post training -mu MU, --mu MU parameter mu to control the learning rate -nq, --not_query with means False not query, without means True query -sh, --show with means True show plt, without means False not show ```

For example, to run the code with the default parameters, you can execute the following command: bash bash run.sh 1 "-cs 0,1,2,3 -c cpu_util,mem_util -gle 500 -w pdtw -rf -sh -lrs adaptive -n pdtw,full_workflow" where $1$ is the number of times to run main.py.

We have provided pre-processed data from four public datasets (Alibaba, Azure, Google, Alibaba-AI) in folder data. More details about the dataset can be found in the paper.

Additionally, if you do not use the datasets in data, you should set some parameters about the dataset in the file data/parameters.py. And we also provide the code for preprocessing the datasets in the file lib/preprocess.py. Please ensure that your datasets meets the following requirements:

Format: CSV file.
Filename Convention: The filename must follow the format {cloud_type}_{freq}.csv.
Required Columns: The CSV should include time and workload data. (For example, refer to the format of Alibaba_1T.csv.)

Finally, we offer a logging feature to facilitate the management of historical output data. All log entries (including arguments, model parameters, and other information) are stored in the log.json file within the output directory. You can remove specific timestamp entries from this file, and then run the program with the -rf parameter to clear the corresponding output data.

Prerequisites

To run the code, the following libraries are needed:

Python >= 3.9
fastdtw==0.3.4
scikit_learn>=1.2.2
scipy>=1.10.1
Pytorch>=1.12.1
torchvision>=0.13

Check environment.yaml for details. To install dependencies, run:

bash pip install -r requirements.txt

Citing

If you use this repository, please cite: bibtex @inproceedings{kuangp3forecast, title={{P\textsuperscript{3}Forecast: Personalized Privacy-Preserving Cloud Workload Prediction Based on Federated Generative Adversarial Networks}}, author={Kuang, Yu and Yan, Li and Li, Zhuozhao}, booktitle={Proc. of IEEE International Parallel \& Distributed Processing Symposium}, year={2025}, }

Owner

Login: liyan2015
Kind: user

Repositories: 2
Profile: https://github.com/liyan2015

Citation (CITATION.bib)

@inproceedings{kuangp3forecast,
  title={{P\textsuperscript{3}Forecast: Personalized Privacy-Preserving Cloud Workload Prediction Based on Federated Generative Adversarial Networks}},
  author={Kuang, Yu and Yan, Li and Li, Zhuozhao},
  booktitle={Proc. of IEEE International Parallel \& Distributed Processing Symposium},
  year={2025},
}

GitHub Events

Total

Delete event: 1
Public event: 1
Push event: 10
Fork event: 2

Last Year

Delete event: 1
Public event: 1
Push event: 10
Fork event: 2

Dependencies

environment.yaml pypi

requirements.txt pypi

fastdtw ==0.3.4
matplotlib ==3.7.1
numpy ==1.24.3
pandarallel ==1.6.4
pandas ==1.5.3
scikit_learn ==1.2.2
scipy ==1.10.1
torch ==1.12.1
torchvision ==0.13.1
tqdm ==4.64.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science