hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

https://github.com/learning-at-home/hivemind

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 34 committers (2.9%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.6%) to scientific vocabulary

Keywords

asynchronous-programming asyncio deep-learning dht distributed-systems distributed-training hivemind machine-learning mixture-of-experts neural-networks pytorch volunteer-computing

Keywords from Contributors

cryptocurrency transformer jax cryptography interactive sequences network-simulation testing-tools hacking observability

Last synced: 6 months ago · JSON representation ·

Repository

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

Basic Info

Host: GitHub
Owner: learning-at-home
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 12.2 MB

Statistics

Stars: 2,248
Watchers: 56
Forks: 199
Open Issues: 89
Releases: 26

Topics

asynchronous-programming asyncio deep-learning dht distributed-systems distributed-training hivemind machine-learning mixture-of-experts neural-networks pytorch volunteer-computing

Created almost 6 years ago · Last pushed 7 months ago

Metadata Files

Readme Contributing License Citation

Hivemind: decentralized deep learning in PyTorch

Codecov

Hivemind is a PyTorch library for decentralized deep learning across the Internet. Its intended usage is training one large model on hundreds of computers from different universities, companies, and volunteers.

Key Features

Distributed training without a master node: Distributed Hash Table allows connecting computers in a decentralized network.
Fault-tolerant backpropagation: forward and backward passes succeed even if some nodes are unresponsive or take too long to respond.
Decentralized parameter averaging: iteratively aggregate updates from multiple workers without the need to synchronize across the entire network (paper).
Train neural networks of arbitrary size: parts of their layers are distributed across the participants with the Decentralized Mixture-of-Experts (paper).

To learn more about the ideas behind this library, see the full list of our papers below.

Example Use Cases

This section lists projects that leverage hivemind for decentralized training. If you have successfully trained a model or created a downstream repository with the help of our library, feel free to submit a pull request that adds your project to this list.

Petals (webpage, code) — a decentralized platform for inference and fine-tuning of 100B+ language models.
Training Transformers Together (webpage, code) — a NeurIPS 2021 demonstration that trained a collaborative text-to-image Transformer model.
CALM (webpage, code) — a masked language model trained on a combination of Arabic datasets.
sahajBERT (blog post, code) — a collaboratively pretrained ALBERT-xlarge for the Bengali language.
PyTorch Lightning Integration (docs). Integration into PyTorch Lightning allows adapting your existing pipelines to training over slow network with unreliable peers.

Installation

Before installing, make sure that your environment has Python 3.8+ and PyTorch 1.9.0 or newer. They can be installed either natively or with Anaconda.

You can get the latest release with pip or build hivemind from source.

With pip

If your versions of Python and PyTorch match the requirements, you can install hivemind from pip:

pip install hivemind

Also, if you want to use blockwise 8-bit compression from bitsandbytes during data transfer, you can install it with pip install hivemind[bitsandbytes]. After that, you can use the BlockwiseQuantization class in hivemind.compression

From source

To install hivemind from source, simply run the following:

git clone https://github.com/learning-at-home/hivemind.git cd hivemind pip install .

If you would like to verify that your installation is working properly, you can install with pip install .[dev] instead. Then, you can run the tests with pytest tests/.

By default, hivemind uses the precompiled binary of the go-libp2p-daemon library. If you face compatibility issues or want to build the binary yourself, you can recompile it by running pip install . --global-option="--buildgo". Before running the compilation, please ensure that your machine has a recent version of Go toolchain (1.15 or 1.16 are supported).

System requirements

Linux is the default OS for which hivemind is developed and tested. We recommend Ubuntu 18.04+ (64-bit), but other 64-bit distros should work as well. Legacy 32-bit is not recommended.
macOS is partially supported. If you have issues, you can run hivemind using Docker instead. We recommend using our Docker image.
Windows 10+ (experimental) can run hivemind using WSL. You can configure WSL to use GPU by following sections 1–3 of this guide by NVIDIA. After that, you can simply follow the instructions above to install with pip or from source.

Documentation

The quickstart tutorial walks through installation and a training a simple neural network with several peers.
examples/albert contains the starter kit and instructions for training a Transformer masked language model collaboratively.
The Mixture-of-Experts tutorial covers the usage of Decentralized Mixture-of-Experts layers.
API reference and additional tutorials are available at learning-at-home.readthedocs.io

If you have any questions about installing and using hivemind, feel free to ask them in our Discord chat or file an issue.

Contributing

Hivemind is currently at the active development stage, and we welcome all contributions. Everything, from bug fixes and documentation improvements to entirely new features, is appreciated.

If you want to contribute to hivemind but don't know where to start, take a look at the unresolved issues. Open a new issue or join our chat room in case you want to discuss new functionality or report a possible bug. Bug fixes are always welcome, but new features should be preferably discussed with maintainers beforehand.

If you want to start contributing to the source code of hivemind, please see the contributing guidelines first. To learn more about other ways to contribute, read our guide.

Collaborators and Sponsorship

Prime Intellect sponsoring compute resources over Modal for CI

Citation

If you found hivemind or its underlying algorithms useful for your research, please cite the following source:

bibtex @misc{hivemind, title = {{H}ivemind: {D}ecentralized {D}eep {L}earning in {P}y{T}orch}, author = {Max Ryabinin and Alexander Borzunov and Michael Diskin and Anton Gusev and Denis Mazur and Vsevolod Plokhotnyuk and Alexey Bukhtiyarov and Pavel Samygin and Anton Sinitsin and Artem Chumachenko}, month = apr, year = 2020, address = {Online}, url = {https://github.com/learning-at-home/hivemind} }

Also, you can cite the paper that inspired the creation of this library (prototype implementation of hivemind available at mryab/learning-at-home):

bibtex @inproceedings{ryabinin2020crowdsourced, title = {Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts}, author = {Ryabinin, Max and Gusev, Anton}, year = 2020, booktitle = {Advances in Neural Information Processing Systems}, volume = 33, url = {https://proceedings.neurips.cc/paper/2020/file/25ddc0f8c9d3e22e03d3076f98d83cb2-Paper.pdf} }

Additional publications

["Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices"](https://arxiv.org/abs/2103.03239) ```bibtex @inproceedings{ryabinin2021moshpit, title = {Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices}, author = {Ryabinin, Max and Gorbunov, Eduard and Plokhotnyuk, Vsevolod and Pekhimenko, Gennady}, year = 2021, booktitle = {Advances in Neural Information Processing Systems}, volume = 34, url = {https://proceedings.neurips.cc/paper/2021/file/97275a23ca44226c9964043c8462be96-Paper.pdf} } ``` ["Distributed Deep Learning in Open Collaborations"](https://arxiv.org/abs/2106.10207) ```bibtex @inproceedings{diskin2021distributed, title = {Distributed Deep Learning In Open Collaborations}, author = {Michael Diskin and Alexey Bukhtiyarov and Max Ryabinin and Lucile Saulnier and Quentin Lhoest and Anton Sinitsin and Dmitry Popov and Dmitriy Pyrkin and Maxim Kashirin and Alexander Borzunov and Albert Villanova del Moral and Denis Mazur and Ilia Kobelev and Yacine Jernite and Thomas Wolf and Gennady Pekhimenko}, year = 2021, booktitle = {Advances in Neural Information Processing Systems}, url = {https://openreview.net/forum?id=FYHktcK-7v} } ``` ["Secure Distributed Training at Scale"](https://arxiv.org/abs/2106.11257) ```bibtex @inproceedings{gorbunov2022secure, title = {Secure Distributed Training at Scale}, author = {Gorbunov, Eduard and Borzunov, Alexander and Diskin, Michael and Ryabinin, Max}, year = 2022, month = {17--23 Jul}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, series = {Proceedings of Machine Learning Research}, volume = 162, url = {https://proceedings.mlr.press/v162/gorbunov22a.html} } ``` ["Training Transformers Together"](https://arxiv.org/abs/2207.03481) ```bibtex @misc{borzunov2022training, title = {Training Transformers Together}, author = {Alexander Borzunov and Max Ryabinin and Tim Dettmers and Quentin Lhoest and Lucile Saulnier and Michael Diskin and Yacine Jernite and Thomas Wolf}, year = 2022, eprint = {2207.03481}, archiveprefix = {arXiv}, primaryclass = {cs.LG} } ``` ["Petals: Collaborative Inference and Fine-tuning of Large Models"](https://arxiv.org/abs/2209.01188) ```bibtex @inproceedings{borzunov-etal-2023-petals, title = {Petals: Collaborative Inference and Fine-tuning of Large Models}, author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Ryabinin, Max and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin}, year = 2023, month = jul, booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)}, publisher = {Association for Computational Linguistics}, address = {Toronto, Canada}, pages = {558--568}, doi = {10.18653/v1/2023.acl-demo.54}, url = {https://aclanthology.org/2023.acl-demo.54}, editor = {Bollegala, Danushka and Huang, Ruihong and Ritter, Alan}, } ``` ["SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"](https://arxiv.org/abs/2301.11913) ```bibtex @inproceedings{ryabinin2023swarm, title = {{SWARM} Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient}, author = {Ryabinin, Max and Dettmers, Tim and Diskin, Michael and Borzunov, Alexander}, year = 2023, month = {23--29 Jul}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, publisher = {PMLR}, series = {Proceedings of Machine Learning Research}, volume = 202, pages = {29416--29440}, url = {https://proceedings.mlr.press/v202/ryabinin23a.html}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, pdf = {https://proceedings.mlr.press/v202/ryabinin23a/ryabinin23a.pdf} } ``` ["Distributed Inference and Fine-tuning of Large Language Models Over The Internet"](https://arxiv.org/abs/2312.08361) ```bibtex @inproceedings{borzunov2023distributed, title = {Distributed Inference and Fine-tuning of Large Language Models Over The Internet}, author = {Alexander Borzunov and Max Ryabinin and Artem Chumachenko and Dmitry Baranchuk and Tim Dettmers and Younes Belkada and Pavel Samygin and Colin Raffel}, year = 2023, booktitle = {Thirty-seventh Conference on Neural Information Processing Systems}, url = {https://openreview.net/forum?id=XmN7ZNbUAe} } ```

We also maintain a list of related projects and acknowledgements.

Owner

Name: learning@home
Login: learning-at-home
Kind: organization

Repositories: 11
Profile: https://github.com/learning-at-home

Citation (CITATION.cff)

cff-version: "1.2.0"
date-released: 2020-04
message: "If you use this software, please cite it as below."
title: "Hivemind: A Library For Decentralized Deep Learning"
url: "https://github.com/learning-at-home/hivemind"
authors:
  - family-names: Ryabinin
    given-names: Max
  - family-names: Borzunov
    given-names: Alexander
  - family-names: Diskin
    given-names: Michael
  - family-names: Gusev
    given-names: Anton
  - family-names: Mazur
    given-names: Denis
  - family-names: Plokhotnyuk
    given-names: Vsevolod
  - family-names: Bukhtiyarov
    given-names: Alexey
  - family-names: Samygin
    given-names: Pavel
  - family-names: Sinitsin
    given-names: Anton
  - family-names: Chumachenko
    given-names: Artem

GitHub Events

Total

Create event: 19
Release event: 1
Issues event: 18
Watch event: 196
Delete event: 17
Issue comment event: 40
Push event: 144
Pull request review comment event: 7
Pull request review event: 23
Pull request event: 37
Fork event: 37

Last Year

Create event: 19
Release event: 1
Issues event: 18
Watch event: 196
Delete event: 17
Issue comment event: 40
Push event: 144
Pull request review comment event: 7
Pull request review event: 23
Pull request event: 37
Fork event: 37

Committers

Last synced: 9 months ago

All Time

Total Commits: 584
Total Committers: 34
Avg Commits per committer: 17.176
Development Distribution Score (DDS): 0.461

Past Year

Commits: 24
Committers: 7
Avg Commits per committer: 3.429
Development Distribution Score (DDS): 0.333

Top Committers

Name	Email	Commits
justheuristic	j**c@g**m	315
Max Ryabinin	m**0@g**m	88
Alexander Borzunov	h**a@g**m	81
Michael Diskin	y****2	17
Anton Gusev	u**n@m**u	13
Denis Mazur	d**8@g**m	12
Vsevolod-pl	V****l	10
Alexey Bukhtiyarov	a**v@y**u	7
Anton Sinitsin	3****t	3
Artem Chumachenko	a**k@g**m	3
Ink	L**k@p**m	3
foksly	3****y	3
Pavel Samygin	4****y	3
Dmitry Baranchuk	d**k@g**m	2
Egiazarian Vage	V**7@y**u	2
MaximKsh	k**x@g**m	2
Roman Zhytar	s**2@g**m	2
romakail	3****l	2
xloem	0**m@g**m	1
samsja	5****a	1
ploshkin	a**n@g**m	1
mponty	h**h@g**m	1
dependabot[bot]	4****]	1
cleong110	1****0	1
cirquit	a**o@p**m	1
Serge Rogatch	s**h@g**m	1
Restyled.io	c**s@r**o	1
Raphael Udoka Chinenye	4****r	1
MuXauJl11110	4****0	1
Kai Gao	k**o@s**n	1
and 4 more...

Committer Domains (Top 20 + Academic)

yandex.ru: 2 scu.edu.cn: 1 restyled.io: 1 mail.ru: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 62
Total pull requests: 142
Average time to close issues: 7 months
Average time to close pull requests: 25 days
Total issue authors: 38
Total pull request authors: 29
Average comments per issue: 1.42
Average comments per pull request: 1.13
Merged pull requests: 108
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 14
Pull requests: 31
Average time to close issues: N/A
Average time to close pull requests: about 1 month
Issue authors: 14
Pull request authors: 8
Average comments per issue: 0.86
Average comments per pull request: 0.55
Merged pull requests: 23
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

justheuristic (13)
borzunov (4)
elricwan (3)
chavinlo (3)
samsja (3)
poedator (2)
Vectorrent (2)
cirquit (2)
ysn100ysn (1)
amerfarooq (1)
stoneyang (1)
gilpluralis (1)
BlinkDL (1)
yuanluw (1)
yanpluralis (1)

Pull Request Authors

mryab (55)
borzunov (34)
justheuristic (17)
Vectorrent (11)
samsja (6)
dvmazur (4)
ikmckenz (3)
Vahe1994 (2)
chiangmaioneluv (2)
GreenFatGuy (2)
hayotensor (2)
emiapwil (2)
kitty121 (2)
IAL32 (2)
dbaranchuk (2)

Top Labels

Issue Labels

bug (25) enhancement (19) help wanted (15) good first issue (4) server (4) refactor (2) p2p (2) ci (1) mixture-of-experts (1) discussion (1)

Pull Request Labels

bug (1) refactor (1)

Packages

Total packages: 2
Total downloads:
- pypi 124,671 last-month
Total docker downloads: 1,220

Total dependent packages: 6
(may contain duplicates)
Total dependent repositories: 17
(may contain duplicates)
Total versions: 29
Total maintainers: 3

pypi.org: hivemind

Decentralized deep learning in PyTorch

Homepage: https://github.com/learning-at-home/hivemind
Documentation: https://hivemind.readthedocs.io/
License: MIT
Latest release: 1.1.11
published 10 months ago

Versions: 27
Dependent Packages: 6
Dependent Repositories: 17
Downloads: 124,671 Last month
Docker Downloads: 1,220

Rankings

Dependent packages count: 1.4%

Stargazers count: 1.7%

Docker downloads count: 2.1%

Average: 2.9%

Dependent repos count: 3.5%

Forks count: 4.3%

Downloads: 4.3%

Maintainers (3)

justheuristic mryab vsevolod_pl

Last synced: 6 months ago

proxy.golang.org: github.com/learning-at-home/hivemind

Documentation: https://pkg.go.dev/github.com/learning-at-home/hivemind#section-documentation
License: mit
Latest release: v0.8.0
published over 5 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 6.5%

Average: 6.7%

Dependent repos count: 7.0%

Last synced: 6 months ago

Dependencies

examples/albert/requirements.txt pypi

datasets ==1.5.0
nltk ==3.6.7
requests *
sentencepiece *
torch_optimizer ==0.1.0
transformers ==4.6.0
wandb ==0.10.26

requirements-dev.txt pypi

black ==22.3.0 development
isort ==5.10.1 development
psutil * development
pytest-asyncio ==0.16.0 development
pytest-cov * development
pytest-forked * development
scikit-learn * development
torchvision * development
tqdm * development

requirements-docs.txt pypi

docutils ==0.16
recommonmark ==0.5.0
sphinx ==4.2.0
sphinx_rtd_theme ==0.4.3

requirements.txt pypi

PyYAML *
configargparse >=1.2.3
cryptography >=3.4.6
grpcio-tools >=1.33.2
msgpack >=0.5.6
multiaddr >=0.0.9
numpy >=1.17
prefetch_generator >=1.0.1
protobuf >=3.12.2
pydantic >=1.8.1
pymultihash >=0.8.2
scipy >=1.2.1
sortedcontainers *
torch >=1.6.0
uvloop >=0.14.0

.github/workflows/check-style.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
codespell-project/actions-codespell v1 composite
isort/isort-action master composite
psf/black stable composite

.github/workflows/push-docker-image.yml actions

actions/checkout v2 composite
crazy-max/ghaction-docker-meta v2 composite
docker/build-push-action v2 composite
docker/login-action v1 composite
docker/setup-buildx-action v1 composite

.github/workflows/run-benchmarks.yml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run-tests.yml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/setup-go v3 composite
actions/setup-python v2 composite
codecov/codecov-action v1 composite

Dockerfile docker

nvcr.io/nvidia/cuda 10.2-cudnn8-devel-ubuntu18.04 build

pyproject.toml pypi

setup.py pypi

hivemind

Science Score: 77.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Hivemind: decentralized deep learning in PyTorch

Key Features

Example Use Cases

Installation

With pip

From source

System requirements

Documentation

Contributing

Collaborators and Sponsorship

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: hivemind

Rankings

Maintainers (3)

proxy.golang.org: github.com/learning-at-home/hivemind

Rankings

Dependencies