streamflow-fl

Federated Learning with StreamFlow

https://github.com/alpha-unito/streamflow-fl

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Federated Learning with StreamFlow

Basic Info
  • Host: GitHub
  • Owner: alpha-unito
  • License: lgpl-3.0
  • Language: Python
  • Default Branch: master
  • Size: 25.4 KB
Statistics
  • Stars: 2
  • Watchers: 6
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

Federated Learning with StreamFlow

This repository contains a StreamFlow Federated Learning (FL) pipeline based on PyTorch. The workflow trains a VGG16 model with Group Normalization over two datasets:

  • A standard version of MNIST;
  • A grayscaled version of SVHN.

The workflow is described with an extended version of CWL that introduces support for the Loop construct, necessary to describe the training-aggregate iteration of FL workloads.

Datasets have been placed onto two different HPC facilities:

  • MNIST has been trained on the EPITO cluster at the University of Torino (1 80-core Arm Neoverse N1, 512GB RAM, and 2 NVIDIA A100 GPU per node);
  • SVHN has been trained on the CINECA MARCONI100 cluster in Bologna (2 16-core IBM POWER9 AC922, 256GB RAM, and 4 NVIDIA V100 GPUs per node).

Since HPC worker nodes cannot access the Internet through outbound connections, this workload cannot be managed by FL frameworks that require direct bidirectional connections between worker and aggregator nodes. Conversely, StreamFlow relies on a pull-based data transfer mechanism that overcomes this limitation.

To also perform a direct comparison between StreamFlow and the Intel OpenFL framework, the pipeline has also been executed over two VMs (8 cores, 32GB RAM, 1 NVIDIA T4 GPU each) hosted on the HPC4AI Cloud at the University of Torino, acting as workers. Conversely, the aggregation plane has always been placed on Cloud.

If you want to cite this work, please use the reference below:

bibtex @inproceedings{22:ml4astro, location = {Catania, Italy}, author = {Iacopo Colonnelli and Bruno Casella and Gianluca Mittone and Yasir Arfat and Barbara Cantalupo and Roberto Esposito and Alberto Riccardo Martinelli and Doriana Medi\'{c} and Marco Aldinucci}, booktitle = {Astrophysics and Space Science Proceedings}, doi = {10.1007/978-3-031-34167-0_39}, editor = {Filomena Bufano and Simone Riggi and Eva Sciacca and Francesco Schillir\`{o}}, isbn = {978-3-031-34167-0}, pages = {193--199}, publisher = {Springer}, address = {Cham, Switzerland}, title = {Federated Learning meets {HPC} and cloud}, volume = {60}, year = {2023} }

Usage

To run the experiment as is, clone this repository on the aggregator node and use the following commands:

bash python -m venv venv source venv/bin/activate pip install "streamflow==0.2.0.dev2" pip install -r requirements.txt streamflow run streamflow.yml

Reproducing the experiments in the same environment requires access to both HPC facilities and the HPC4AI Cloud. However, interested users can run the same pipeline on their preferred infrastructure by changing the deployments definitions in the streamflow.yml file and the corresponding Slurm/SSH scripts inside the environments folder.

Also, note that the Python dependencies listed in the requirements.txt file should be manually installed in any involved location (both the workers and the aggregator), and the datasets are supposed to be already present in the worker nodes.

Contributors

Iacopo Colonnelli iacopo.colonnelli@unito.it
Bruno Casella bruno.casella@unito.it
Marco Aldinucci marco.aldinucci@unito.it

Owner

  • Name: Parallel programming: Alpha group
  • Login: alpha-unito
  • Kind: organization
  • Location: Torino, IT

Parallel Computing research cluster, Department of Computer Science, University of Torino

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you want to cite StreamFlow FL, please refer to the article below."
authors:
  - family-names: "Colonnelli"
    given-names: "Iacopo"
    orcid: "https://orcid.org/0000-0001-9290-2017"
title: "Federated Learning meets HPC and cloud"
version: 0.1
url: "https://github.com/alpha-unito/xffl"
preferred-citation:
  type: conference-paper
  authors:
    - family-names: "Colonnelli"
      given-names: "Iacopo"
      orcid: "https://orcid.org/0000-0001-9290-2017"
    - family-names: "Casella"
      given-names: "Bruno"
      orcid: "https://orcid.org/0000-0002-9513-6087"
    - family-names: "Mittone"
      given-names: "Gianluca"
      orcid: "https://orcid.org/0000-0002-1887-6911"
    - family-names: "Arfat"
      given-names: "Yasir"
      orcid: "https://orcid.org/0000-0002-6330-0399"
    - family-names: "Cantalupo"
      given-names: "Barbara"
      orcid: "https://orcid.org/0000-0001-7575-3902"
    - family-names: "Esposito"
      given-names: "Roberto"
      orcid: "https://orcid.org/0000-0003-4708-6860"
    - family-names: "Martinelli"
      given-names: "Alberto Riccardo"
      orcid: "https://orcid.org/0000-0002-3707-7015"
    - family-names: "Medić"
      given-names: "Doriana"
      orcid: "https://orcid.org/0000-0002-7163-5375"
    - family-names: "Aldinucci"
      given-names: "Marco"
      orcid: "https://orcid.org/0000-0001-8788-0829"
  doi: 10.1007/978-3-031-34167-0_39
  collection-title: "Astrophysics and Space Science Proceedings"
  editors:
    - family-names: "Bufano"
      given-names: "Filomena"
      orcid: "https://orcid.org/0000-0002-3429-2481"
    - family-names: "Riggi"
      given-names: "Simone"
      orcid: "https://orcid.org/0000-0001-6368-8330"
    - family-names: "Sciacca"
      given-names: "Eva"
      orcid: "https://orcid.org/0000-0002-5574-2787"
    - family-names: "Schillirò"
      given-names: "Francesco"
      orcid: "https://orcid.org/0000-0001-5106-2277"
  isbn: 978-3-031-34167-0
  publisher:
    name: Springer
    city: Cham
    country: CH
  start: 193
  end: 199
  title: "Federated Learning meets HPC and cloud"
  volume: 60
  year: 2023

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Dependencies

openfl/requirements.txt pypi
  • Babel ==2.9.1
  • Jinja2 ==3.1.1
  • Keras-Preprocessing ==1.1.2
  • Markdown ==3.3.6
  • MarkupSafe ==2.1.1
  • Pillow ==9.2.0
  • PyYAML ==6.0
  • Pygments ==2.11.2
  • Pympler ==1.0.1
  • Send2Trash ==1.8.0
  • Werkzeug ==2.1.0
  • absl-py ==1.0.0
  • anyio ==3.5.0
  • argon2-cffi ==21.3.0
  • argon2-cffi-bindings ==21.2.0
  • asttokens ==2.0.5
  • astunparse ==1.6.3
  • attrs ==21.4.0
  • backcall ==0.2.0
  • beautifulsoup4 ==4.10.0
  • bleach ==4.1.0
  • brotlipy ==0.7.0
  • cachetools ==5.0.0
  • certifi ==2021.10.8
  • click ==8.0.1
  • cloudpickle ==2.0.0
  • colorama ==0.4.4
  • commonmark ==0.9.1
  • cycler ==0.11.0
  • debugpy ==1.6.0
  • decorator ==5.1.1
  • defusedxml ==0.7.1
  • docker ==5.0.3
  • dynaconf ==3.1.7
  • entrypoints ==0.4
  • executing ==0.8.3
  • flatten-json ==0.1.13
  • fonttools ==4.34.4
  • gast ==0.3.3
  • google-auth ==2.6.2
  • google-auth-oauthlib ==0.4.6
  • google-pasta ==0.2.0
  • grpcio ==1.34.1
  • grpcio-tools ==1.34.1
  • h5py ==2.10.0
  • importlib-metadata ==4.11.3
  • importlib-resources ==5.6.0
  • ipykernel ==6.11.0
  • ipython ==8.2.0
  • ipython-genutils ==0.2.0
  • jedi ==0.18.1
  • joblib ==1.1.0
  • json5 ==0.9.6
  • jsonschema ==4.4.0
  • jupyter-client ==7.2.1
  • jupyter-core ==4.9.2
  • jupyter-server ==1.16.0
  • jupyterlab ==3.3.2
  • jupyterlab-pygments ==0.1.2
  • jupyterlab-server ==2.12.0
  • keras ==2.8.0
  • kiwisolver ==1.4.4
  • matplotlib ==3.5.2
  • matplotlib-inline ==0.1.3
  • mistune ==0.8.4
  • mkl-fft ==1.3.1
  • mkl-service ==2.4.0
  • nbclassic ==0.3.7
  • nbclient ==0.5.13
  • nbconvert ==6.4.5
  • nbformat ==5.2.0
  • nest-asyncio ==1.5.4
  • notebook ==6.4.10
  • notebook-shim ==0.1.0
  • numpy ==1.18.5
  • oauthlib ==3.2.0
  • opencv-python ==4.6.0.66
  • openfl ==1.3
  • opt-einsum ==3.3.0
  • packaging ==21.3
  • pandas ==1.4.1
  • pandocfilters ==1.5.0
  • parso ==0.8.3
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • pip ==22.1.2
  • prometheus-client ==0.13.1
  • prompt-toolkit ==3.0.28
  • protobuf ==3.19.4
  • psutil ==5.9.0
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • pyparsing ==3.0.7
  • pyrsistent ==0.18.1
  • python-dateutil ==2.8.2
  • pytz ==2022.1
  • pyzmq ==22.3.0
  • requests-oauthlib ==1.3.1
  • rich ==9.1.0
  • rsa ==4.8
  • scikit-learn ==1.0.2
  • scipy ==1.8.0
  • seaborn ==0.11.2
  • setuptools ==61.2.0
  • sniffio ==1.2.0
  • soupsieve ==2.3.1
  • stack-data ==0.2.0
  • tensorboard ==2.8.0
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.1
  • tensorboardX ==2.5
  • tensorflow ==2.3.1
  • tensorflow-estimator ==2.3.0
  • termcolor ==1.1.0
  • terminado ==0.13.3
  • testpath ==0.6.0
  • threadpoolctl ==3.1.0
  • torch ==1.11.0
  • torchaudio ==0.11.0
  • torchsummary ==1.5.1
  • torchvision ==0.12.0
  • tornado ==6.1
  • tqdm ==4.63.1
  • traitlets ==5.1.1
  • typing-extensions ==3.10.0.2
  • wcwidth ==0.2.5
  • webencodings ==0.5.1
  • websocket-client ==1.3.2
  • wheel ==0.37.1
  • wrapt ==1.14.0
  • zipp ==3.7.0
requirements.txt pypi
  • scipy ==1.9.
  • torch ==1.12.
  • torchvision ==0.13.