https://github.com/bigscience-workshop/petals
πΈ Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
-
βcodemeta.json file
Found codemeta.json file -
β.zenodo.json file
-
βDOI references
-
βAcademic publication links
Links to: arxiv.org -
βCommitters with academic emails
-
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
πΈ Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Basic Info
- Host: GitHub
- Owner: bigscience-workshop
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://petals.dev
- Size: 4.06 MB
Statistics
- Stars: 9,788
- Watchers: 103
- Forks: 572
- Open Issues: 111
- Releases: 11
Topics
Metadata Files
README.md

Run large language models at home, BitTorrent-style.
Fine-tuning and inference up to 10x faster than offloading
Generate text with distributed Llama 3.1 (up to 405B), Mixtral (8x22B), Falcon (40B+) or BLOOM (176B) and fineβtune them for your own tasks — right from your desktop computer or Google Colab:
```python from transformers import AutoTokenizer from petals import AutoDistributedModelForCausalLM
Choose any model available at https://health.petals.dev
model_name = "meta-llama/Meta-Llama-3.1-405B-Instruct"
Connect to a distributed network hosting model layers
tokenizer = AutoTokenizer.frompretrained(modelname) model = AutoDistributedModelForCausalLM.frompretrained(modelname)
Run the model as if it were on your computer
inputs = tokenizer("A cat sat", returntensors="pt")["inputids"] outputs = model.generate(inputs, maxnewtokens=5) print(tokenizer.decode(outputs[0])) # A cat sat on a mat... ```
π Try now in Colab
π¦ Want to run Llama? Request access to its weights, then run huggingface-cli login in the terminal before loading the model. Or just try it in our chatbot app.
π Privacy. Your data will be processed with the help of other people in the public swarm. Learn more about privacy here. For sensitive data, you can set up a private swarm among people you trust.
π¬ Any questions? Ping us in our Discord!
Connect your GPU and increase Petals capacity
Petals is a community-run system — we rely on people sharing their GPUs. You can help serving one of the available models or host a new model from π€ Model Hub!
As an example, here is how to host a part of Llama 3.1 (405B) Instruct on your GPU:
π¦ Want to host Llama? Request access to its weights, then run huggingface-cli login in the terminal before loading the model.
π§ Linux + Anaconda. Run these commands for NVIDIA GPUs (or follow this for AMD):
bash
conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install git+https://github.com/bigscience-workshop/petals
python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct
πͺ Windows + WSL. Follow this guide on our Wiki.
π Docker. Run our Docker image for NVIDIA GPUs (or follow this for AMD):
bash
sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \
learningathome/petals:main \
python -m petals.cli.run_server --port 31330 meta-llama/Meta-Llama-3.1-405B-Instruct
π macOS + Apple M1/M2 GPU. Install Homebrew, then run these commands:
bash
brew install python
python3 -m pip install git+https://github.com/bigscience-workshop/petals
python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct
π Learn more (how to use multiple GPUs, start the server on boot, etc.)
π Security. Hosting a server does not allow others to run custom code on your computer. Learn more here.
π¬ Any questions? Ping us in our Discord!
π Thank you! Once you load and host 10+ blocks, we can show your name or link on the swarm monitor as a way to say thanks. You can specify them with --public_name YOUR_NAME.
How does it work?
- You load a small part of the model, then join a network of people serving the other parts. Singleβbatch inference runs at up to 6 tokens/sec for Llama 2 (70B) and up to 4 tokens/sec for Falcon (180B) β enough for chatbots and interactive apps.
- You can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of PyTorch and π€ Transformers.
π Read paper π See FAQ
π Tutorials, examples, and more
Basic tutorials:
- Getting started: tutorial
- Prompt-tune Llama-65B for text semantic classification: tutorial
- Prompt-tune BLOOM to create a personified chatbot: tutorial
Useful tools:
- Chatbot web app (connects to Petals via an HTTP/WebSocket endpoint): source code
- Monitor for the public swarm: source code
Advanced guides:
Benchmarks
Please see Section 3.3 of our paper.
π οΈ Contributing
Please see our FAQ on contributing.
π Citations
Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel. Petals: Collaborative Inference and Fine-tuning of Large Models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). 2023.
bibtex
@inproceedings{borzunov2023petals,
title = {Petals: Collaborative Inference and Fine-tuning of Large Models},
author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin},
booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
pages = {558--568},
year = {2023},
url = {https://arxiv.org/abs/2209.01188}
}
Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, and Colin Raffel. Distributed inference and fine-tuning of large language models over the Internet. Advances in Neural Information Processing Systems 36 (2023).
bibtex
@inproceedings{borzunov2023distributed,
title = {Distributed inference and fine-tuning of large language models over the {I}nternet},
author = {Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin},
booktitle = {Advances in Neural Information Processing Systems},
volume = {36},
pages = {12312--12331},
year = {2023},
url = {https://arxiv.org/abs/2312.08361}
}
This project is a part of the BigScience research workshop.
Owner
- Name: BigScience Workshop
- Login: bigscience-workshop
- Kind: organization
- Email: bigscience-contact@googlegroups.com
- Website: https://bigscience.huggingface.co
- Twitter: BigScienceW
- Repositories: 28
- Profile: https://github.com/bigscience-workshop
Research workshop on large language models - The Summer of Language Models 21
GitHub Events
Total
- Issues event: 10
- Watch event: 616
- Issue comment event: 12
- Pull request event: 4
- Fork event: 64
Last Year
- Issues event: 10
- Watch event: 617
- Issue comment event: 12
- Pull request event: 4
- Fork event: 64
Committers
Last synced: 6 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Alexander Borzunov | b****r@g****m | 219 |
| justheuristic | j****c@g****m | 184 |
| Dmitry Baranchuk | d****k@g****m | 26 |
| Alexander Borzunov | h****a@g****m | 24 |
| Artem Chumachenko | a****k@g****m | 22 |
| Max Ryabinin | m****0@g****m | 16 |
| Dmitry Baranchuk | d****k@q****u | 4 |
| Anton Sinitsin | 3****t@u****m | 3 |
| Pavel Samygin | 4****y@u****m | 2 |
| Vadim Peretokin | v****n@h****m | 2 |
| Denis Mazur | d****8@g****m | 1 |
| Dmitry Baranchuk | d****k@z****t | 1 |
| Egiazarian Vage | V****7@y****u | 1 |
| FYY | t****6@g****m | 1 |
| Guocheng | n****a@o****m | 1 |
| Ikko Eltociear Ashimine | e****r@g****m | 1 |
| Ink | L****k@p****m | 1 |
| Muhtasham Oblokulov | m****7@g****m | 1 |
| Pavel Samygin | 4****y@u****m | 1 |
| Priyanshupareek | 3****k@u****m | 1 |
| Shuchang Zhou | s****u@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 167
- Total pull requests: 181
- Average time to close issues: 24 days
- Average time to close pull requests: 8 days
- Total issue authors: 124
- Total pull request authors: 23
- Average comments per issue: 2.57
- Average comments per pull request: 0.34
- Merged pull requests: 129
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 10
- Pull requests: 6
- Average time to close issues: 6 days
- Average time to close pull requests: 25 minutes
- Issue authors: 9
- Pull request authors: 4
- Average comments per issue: 0.4
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- borzunov (17)
- justheuristic (8)
- ryanshrott (5)
- slush0 (3)
- Ted-developer (3)
- artek0chumak (3)
- pass-pass-pass (2)
- nrs-status (2)
- worldpeaceenginelabs (2)
- sa1utyeggs (2)
- mberman84 (2)
- lbgws2 (2)
- Rohit-03 (2)
- oldcpple (2)
- Thomasbomb (2)
Pull Request Authors
- borzunov (98)
- justheuristic (30)
- artek0chumak (12)
- xtinkt (10)
- mryab (8)
- dvmazur (8)
- jmikedupont2 (7)
- vadi2 (3)
- mandlinsarah (3)
- Vectorrent (2)
- RomaA2000 (2)
- kyoungbinkim (2)
- jhancock1975 (2)
- Bakobiibizo (2)
- iateadonut (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 511 last-month
- Total docker downloads: 129
-
Total dependent packages: 4
(may contain duplicates) -
Total dependent repositories: 2
(may contain duplicates) - Total versions: 29
- Total maintainers: 3
pypi.org: petals
Easy way to efficiently run 100B+ language models without high-end GPUs
- Homepage: https://github.com/bigscience-workshop/petals
- Documentation: https://petals.readthedocs.io/
- License: MIT License
-
Latest release: 2.2.0
published over 2 years ago
Rankings
Maintainers (3)
proxy.golang.org: github.com/bigscience-workshop/petals
- Documentation: https://pkg.go.dev/github.com/bigscience-workshop/petals#section-documentation
- License: mit
-
Latest release: v2.2.0+incompatible
published over 2 years ago
Rankings
pypi.org: test-petals
Easy way to efficiently run 100B+ language models without high-end GPUs
- Homepage: https://github.com/bigscience-workshop/petals
- Documentation: https://test-petals.readthedocs.io/
- License: MIT License
-
Latest release: 2.2.0.post1
published over 2 years ago
Rankings
Maintainers (1)
Dependencies
- black ==22.3.0 development
- isort ==5.10.1 development
- psutil * development
- pytest-asyncio ==0.16.0 development
- pytest-forked * development
- accelerate ==0.10.0
- huggingface-hub ==0.7.0
- torch ==1.12.0