https://github.com/bigscience-workshop/petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

https://github.com/bigscience-workshop/petals

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • β—‹
    CITATION.cff file
  • βœ“
    codemeta.json file
    Found codemeta.json file
  • β—‹
    .zenodo.json file
  • β—‹
    DOI references
  • βœ“
    Academic publication links
    Links to: arxiv.org
  • β—‹
    Committers with academic emails
  • β—‹
    Institutional organization owner
  • β—‹
    JOSS paper metadata
  • β—‹
    Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary

Keywords

bloom chatbot deep-learning distributed-systems falcon gpt guanaco language-models large-language-models llama machine-learning mixtral neural-networks nlp pipeline-parallelism pretrained-models pytorch tensor-parallelism transformer volunteer-computing

Keywords from Contributors

distributed-training asynchronous-programming asyncio dht hivemind mixture-of-experts jax diffusion multi-agents agents
Last synced: 5 months ago · JSON representation

Repository

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Basic Info
  • Host: GitHub
  • Owner: bigscience-workshop
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage: https://petals.dev
  • Size: 4.06 MB
Statistics
  • Stars: 9,788
  • Watchers: 103
  • Forks: 572
  • Open Issues: 111
  • Releases: 11
Topics
bloom chatbot deep-learning distributed-systems falcon gpt guanaco language-models large-language-models llama machine-learning mixtral neural-networks nlp pipeline-parallelism pretrained-models pytorch tensor-parallelism transformer volunteer-computing
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md


Run large language models at home, BitTorrent-style.
Fine-tuning and inference up to 10x faster than offloading


Generate text with distributed Llama 3.1 (up to 405B), Mixtral (8x22B), Falcon (40B+) or BLOOM (176B) and fine‑tune them for your own tasks — right from your desktop computer or Google Colab:

```python from transformers import AutoTokenizer from petals import AutoDistributedModelForCausalLM

Choose any model available at https://health.petals.dev

model_name = "meta-llama/Meta-Llama-3.1-405B-Instruct"

Connect to a distributed network hosting model layers

tokenizer = AutoTokenizer.frompretrained(modelname) model = AutoDistributedModelForCausalLM.frompretrained(modelname)

Run the model as if it were on your computer

inputs = tokenizer("A cat sat", returntensors="pt")["inputids"] outputs = model.generate(inputs, maxnewtokens=5) print(tokenizer.decode(outputs[0])) # A cat sat on a mat... ```

πŸš€  Try now in Colab

πŸ¦™ Want to run Llama? Request access to its weights, then run huggingface-cli login in the terminal before loading the model. Or just try it in our chatbot app.

πŸ” Privacy. Your data will be processed with the help of other people in the public swarm. Learn more about privacy here. For sensitive data, you can set up a private swarm among people you trust.

πŸ’¬ Any questions? Ping us in our Discord!

Connect your GPU and increase Petals capacity

Petals is a community-run system — we rely on people sharing their GPUs. You can help serving one of the available models or host a new model from πŸ€— Model Hub!

As an example, here is how to host a part of Llama 3.1 (405B) Instruct on your GPU:

πŸ¦™ Want to host Llama? Request access to its weights, then run huggingface-cli login in the terminal before loading the model.

🐧 Linux + Anaconda. Run these commands for NVIDIA GPUs (or follow this for AMD):

bash conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia pip install git+https://github.com/bigscience-workshop/petals python -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

πŸͺŸ Windows + WSL. Follow this guide on our Wiki.

πŸ‹ Docker. Run our Docker image for NVIDIA GPUs (or follow this for AMD):

bash sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \ learningathome/petals:main \ python -m petals.cli.run_server --port 31330 meta-llama/Meta-Llama-3.1-405B-Instruct

🍏 macOS + Apple M1/M2 GPU. Install Homebrew, then run these commands:

bash brew install python python3 -m pip install git+https://github.com/bigscience-workshop/petals python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

πŸ“š  Learn more (how to use multiple GPUs, start the server on boot, etc.)

πŸ”’ Security. Hosting a server does not allow others to run custom code on your computer. Learn more here.

πŸ’¬ Any questions? Ping us in our Discord!

πŸ† Thank you! Once you load and host 10+ blocks, we can show your name or link on the swarm monitor as a way to say thanks. You can specify them with --public_name YOUR_NAME.

How does it work?

  • You load a small part of the model, then join a network of people serving the other parts. Single‑batch inference runs at up to 6 tokens/sec for Llama 2 (70B) and up to 4 tokens/sec for Falcon (180B) β€” enough for chatbots and interactive apps.
  • You can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of PyTorch and πŸ€— Transformers.

πŸ“œ  Read paper            πŸ“š  See FAQ

πŸ“š Tutorials, examples, and more

Basic tutorials:

  • Getting started: tutorial
  • Prompt-tune Llama-65B for text semantic classification: tutorial
  • Prompt-tune BLOOM to create a personified chatbot: tutorial

Useful tools:

Advanced guides:

  • Launch a private swarm: guide
  • Run a custom model: guide

Benchmarks

Please see Section 3.3 of our paper.

πŸ› οΈ Contributing

Please see our FAQ on contributing.

πŸ“œ Citations

Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, and Colin Raffel. Petals: Collaborative Inference and Fine-tuning of Large Models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). 2023.

bibtex @inproceedings{borzunov2023petals, title = {Petals: Collaborative Inference and Fine-tuning of Large Models}, author = {Borzunov, Alexander and Baranchuk, Dmitry and Dettmers, Tim and Riabinin, Maksim and Belkada, Younes and Chumachenko, Artem and Samygin, Pavel and Raffel, Colin}, booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)}, pages = {558--568}, year = {2023}, url = {https://arxiv.org/abs/2209.01188} }

Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, and Colin Raffel. Distributed inference and fine-tuning of large language models over the Internet. Advances in Neural Information Processing Systems 36 (2023).

bibtex @inproceedings{borzunov2023distributed, title = {Distributed inference and fine-tuning of large language models over the {I}nternet}, author = {Borzunov, Alexander and Ryabinin, Max and Chumachenko, Artem and Baranchuk, Dmitry and Dettmers, Tim and Belkada, Younes and Samygin, Pavel and Raffel, Colin}, booktitle = {Advances in Neural Information Processing Systems}, volume = {36}, pages = {12312--12331}, year = {2023}, url = {https://arxiv.org/abs/2312.08361} }


This project is a part of the BigScience research workshop.

Owner

  • Name: BigScience Workshop
  • Login: bigscience-workshop
  • Kind: organization
  • Email: bigscience-contact@googlegroups.com

Research workshop on large language models - The Summer of Language Models 21

GitHub Events

Total
  • Issues event: 10
  • Watch event: 616
  • Issue comment event: 12
  • Pull request event: 4
  • Fork event: 64
Last Year
  • Issues event: 10
  • Watch event: 617
  • Issue comment event: 12
  • Pull request event: 4
  • Fork event: 64

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 513
  • Total Committers: 21
  • Avg Commits per committer: 24.429
  • Development Distribution Score (DDS): 0.573
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Alexander Borzunov b****r@g****m 219
justheuristic j****c@g****m 184
Dmitry Baranchuk d****k@g****m 26
Alexander Borzunov h****a@g****m 24
Artem Chumachenko a****k@g****m 22
Max Ryabinin m****0@g****m 16
Dmitry Baranchuk d****k@q****u 4
Anton Sinitsin 3****t@u****m 3
Pavel Samygin 4****y@u****m 2
Vadim Peretokin v****n@h****m 2
Denis Mazur d****8@g****m 1
Dmitry Baranchuk d****k@z****t 1
Egiazarian Vage V****7@y****u 1
FYY t****6@g****m 1
Guocheng n****a@o****m 1
Ikko Eltociear Ashimine e****r@g****m 1
Ink L****k@p****m 1
Muhtasham Oblokulov m****7@g****m 1
Pavel Samygin 4****y@u****m 1
Priyanshupareek 3****k@u****m 1
Shuchang Zhou s****u@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 167
  • Total pull requests: 181
  • Average time to close issues: 24 days
  • Average time to close pull requests: 8 days
  • Total issue authors: 124
  • Total pull request authors: 23
  • Average comments per issue: 2.57
  • Average comments per pull request: 0.34
  • Merged pull requests: 129
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 10
  • Pull requests: 6
  • Average time to close issues: 6 days
  • Average time to close pull requests: 25 minutes
  • Issue authors: 9
  • Pull request authors: 4
  • Average comments per issue: 0.4
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • borzunov (17)
  • justheuristic (8)
  • ryanshrott (5)
  • slush0 (3)
  • Ted-developer (3)
  • artek0chumak (3)
  • pass-pass-pass (2)
  • nrs-status (2)
  • worldpeaceenginelabs (2)
  • sa1utyeggs (2)
  • mberman84 (2)
  • lbgws2 (2)
  • Rohit-03 (2)
  • oldcpple (2)
  • Thomasbomb (2)
Pull Request Authors
  • borzunov (98)
  • justheuristic (30)
  • artek0chumak (12)
  • xtinkt (10)
  • mryab (8)
  • dvmazur (8)
  • jmikedupont2 (7)
  • vadi2 (3)
  • mandlinsarah (3)
  • Vectorrent (2)
  • RomaA2000 (2)
  • kyoungbinkim (2)
  • jhancock1975 (2)
  • Bakobiibizo (2)
  • iateadonut (1)
Top Labels
Issue Labels
help wanted (9) 1day (7) good first issue (7) enhancement (4) bug (4) documentation (3) development (2) research (1)
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 511 last-month
  • Total docker downloads: 129
  • Total dependent packages: 4
    (may contain duplicates)
  • Total dependent repositories: 2
    (may contain duplicates)
  • Total versions: 29
  • Total maintainers: 3
pypi.org: petals

Easy way to efficiently run 100B+ language models without high-end GPUs

  • Versions: 18
  • Dependent Packages: 4
  • Dependent Repositories: 2
  • Downloads: 501 Last month
  • Docker Downloads: 129
Rankings
Stargazers count: 0.3%
Forks count: 2.8%
Dependent packages count: 3.2%
Docker downloads count: 4.6%
Average: 4.8%
Downloads: 6.4%
Dependent repos count: 11.5%
Maintainers (3)
Last synced: 6 months ago
proxy.golang.org: github.com/bigscience-workshop/petals
  • Versions: 10
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 6 months ago
pypi.org: test-petals

Easy way to efficiently run 100B+ language models without high-end GPUs

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 10 Last month
Rankings
Dependent packages count: 10.0%
Average: 38.8%
Dependent repos count: 67.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements-dev.txt pypi
  • black ==22.3.0 development
  • isort ==5.10.1 development
  • psutil * development
  • pytest-asyncio ==0.16.0 development
  • pytest-forked * development
requirements.txt pypi
  • accelerate ==0.10.0
  • huggingface-hub ==0.7.0
  • torch ==1.12.0