Recent Releases of https://github.com/bigscience-workshop/petals

https://github.com/bigscience-workshop/petals - v2.2.0: Falcon, macOS support, and more

Highlights

๐Ÿฆ… Falcon support. Petals now supports all models based on Falcon, including Falcon 180B released today. We improved the ๐Ÿค— Transformers FalconModel implementation to be up to 40% faster on recent GPUs. Our chatbot app runs Falcon 180B-Chat at ~2 tokens/sec.

Falcon-40B is licensed under Apache 2.0, so you can load it by specifying tiiuae/falcon-40b or tiiuae/falcon-40b-instruct as the model name. Falcon-180B is licensed under a custom license, and it is not clear if we can provide a Python interface for inference and fine-tuning of this model. Right now, it is only available in the chatbot app, and we are waiting for further clarifications from TII on this issue.

๐Ÿ Native macOS support. You can run Petals clients and servers on macOS natively - just install Homebrew and run these commands:

bash brew install python python3 -m pip install git+https://github.com/bigscience-workshop/petals python3 -m petals.cli.run_server petals-team/StableBeluga2

If your computer has Apple M1/M2 chip, the Petals server will use the integrated GPU automatically. We recommend to only host Llama-based models, since other supported architectures do not work efficiently on M1/M2 chips yet. We also recommend using Python 3.10+ on macOS (installed by Homebrew automatically).

๐Ÿ”Œ Serving custom models. Custom models now automatically show up at https://health.petals.dev as "not officially supported" models. As a reminder, you are not limited to models available at https://health.petals.dev and can run a server hosting any model based on BLOOM, Llama, or Falcon architecture (given that it's allowed by the model license), or even add a support for a new architecture yourself. We also improved Petals compatibility with some popular Llama-based models (e.g., models from NousResearch) in this release.

๐Ÿž Bug fixes. This release also fixes inference of prefix-tuned models, which was broken in Petals 2.1.0.

What's Changed

  • Require transformers>=4.32.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/479
  • Fix requiring transformers>=4.32.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/480
  • Rewrite MemoryCache alloc_timeout logic by @justheuristic in https://github.com/bigscience-workshop/petals/pull/434
  • Refactor readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/482
  • Support macOS natively by @borzunov in https://github.com/bigscience-workshop/petals/pull/477
  • Remove no-op process in PrioritizedTaskPool by @borzunov in https://github.com/bigscience-workshop/petals/pull/484
  • Fix .generate(input_ids=...) by @borzunov in https://github.com/bigscience-workshop/petals/pull/485
  • Wait for DHT storing state OFFLINE on shutdown by @borzunov in https://github.com/bigscience-workshop/petals/pull/486
  • Fix race condition in MemoryCache by @borzunov in https://github.com/bigscience-workshop/petals/pull/487
  • Replace dots in repo names when building DHT prefixes by @borzunov in https://github.com/bigscience-workshop/petals/pull/489
  • Create model index in DHT by @borzunov in https://github.com/bigscience-workshop/petals/pull/491
  • Force use_cache=True by @borzunov in https://github.com/bigscience-workshop/petals/pull/496
  • Force use_cache=True in config only by @borzunov in https://github.com/bigscience-workshop/petals/pull/497
  • Add Falcon support by @borzunov in https://github.com/bigscience-workshop/petals/pull/499
  • Fix prompt tuning after #464 by @borzunov in https://github.com/bigscience-workshop/petals/pull/501
  • Optimize the Falcon block for inference by @mryab in https://github.com/bigscience-workshop/petals/pull/500

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v2.1.0...v2.2.0

- Python
Published by borzunov over 2 years ago

https://github.com/bigscience-workshop/petals - v2.1.0: ๐Ÿค— .generate(), faster loading, responsive inference, and more

Highlights

๐Ÿ”Œ Compatibility with ๐Ÿค— Transformers generation utils. Petals models now directly use ๐Ÿค— Transformers .generate() implementation instead of custom generation code. This means that you can use a variety of generation methods and constraints implemented in ๐Ÿค— Transformers (e.g., repetition_penalty, beam search, etc.) and expect an exact match between Petals and a model running locally.

Most common methods are compatible with reusing inference sessions, so that you can run .generate() multiple times without reprocessing the dialogue history from scratch:

python with model.inference_session(max_length=100): outputs1 = model.generate(user_prompt1, repetition_penalty=1.2) outputs2 = model.generate(user_prompt2, repetition_penalty=1.2)

โšก Faster loading of Stable Beluga 2. We repacked Stable Beluga 2, the most popular model at the moment, to increase its loading speed and minimize RAM and disk space requirements. The repacked version can be loaded from the petals-team/StableBeluga2 repository and is fully compatible with clients and servers using the standard repository (stabilityai/StableBeluga2).

Now, clients need to download only 1.05 GB of data to run Stable Beluga 2 (instead of ~20 GB needed before) and require only 4 GB of RAM (instead of ~20 GB required before). Servers need to download and store 2x less data and load the model from disk significantly faster. If you're switching from the old repository, don't forget to remove the old cache in the~/.cache/petals/models--stabilityai--StableBeluga2 directory to save disk space.

โฑ๏ธ More responsive inference. In older versions, servers could become unresponsive for a few seconds while processing large prefixes (thousands of tokens) on inference. This release allows to perform small inference requests (a few tokens) in the middle of processing a large request, thus avoiding freezes during token-by-token inference caused by someone processing a large prefix.

๐Ÿ”’ Minor improvements. This release adds support for loading weights in the safetensors format on servers and adds the blocked_servers client option to avoid a given set of servers:

```python from petals import AutoDistributedModelForCausalLM

blockedservers = ["12D3KooWA6g...", "12D3KooWGyD..."] # Full peer IDs from https://health.petals.dev model = AutoDistributedModelForCausalLM.frompretrained(modelname, blockedservers=blocked_servers) ```

๐Ÿž Bug fixes. This release also includes a variety of bug fixes allowing to speed up the chatbot app and fine-tuning, better bypass recently disconnect servers, improve rebalancing algorithm and usability of benchmarks, fix throughput measurements and installation on ARM CPUs.

We also fixed Petals compatibility with the latest releases of ๐Ÿค— Transformers, Accelerate, and PEFT libraries.

Breaking changes

๐Ÿ“– Default inference sessions. If you run .generate() or forward passes inside an .inference_session() context, they now use the opened session by default. These snippets are now equivalent:

```python

Using default session

with model.inferencesession(maxlength=100): outputids = model.generate(inputids, maxnewtokens=3)

Explicitly specifying a session

with model.inferencesession(maxlength=100) as sess: outputids = model.generate(inputids, maxnewtokens=3, session=sess) ```

Earlier, the 1st snippet was creating a new session, which confused most people and lead to bugs.

โžก๏ธ Renaming. We renamed SequenceManagerConfig to petals.ClientConfig and petals.dht_utils to petals.utils.dht. The old names now lead to DeprecationWarnings and will be removed in Petals 2.2.0+.

What's Changed

  • Fix stale link by @bot66 in https://github.com/bigscience-workshop/petals/pull/418
  • Add Discord badge and more Discord links to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/422
  • Add connect_timeout by @borzunov in https://github.com/bigscience-workshop/petals/pull/423
  • Add Stable Beluga 2 to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/424
  • Penalize servers that use relays during rebalancing by @borzunov in https://github.com/bigscience-workshop/petals/pull/428
  • Fix petals.utils.ping for servers with client-mode DHT by @borzunov in https://github.com/bigscience-workshop/petals/pull/430
  • Fix typo and make blocks message more informative by @vadi2 in https://github.com/bigscience-workshop/petals/pull/437
  • Update Discord links from channels to forums by @borzunov in https://github.com/bigscience-workshop/petals/pull/440
  • Remove distracting links from readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/441
  • Remove deprecated comment in fine-tuning notebook by @borzunov in https://github.com/bigscience-workshop/petals/pull/443
  • Use bitsandbytes 0.41.1 by @borzunov in https://github.com/bigscience-workshop/petals/pull/442
  • [Refactor] extract block forward, backward and inference into a separate file by @justheuristic in https://github.com/bigscience-workshop/petals/pull/435
  • Override float32 in config to bfloat16 by @borzunov in https://github.com/bigscience-workshop/petals/pull/431
  • Prefer longer servers for fine-tuning, exclude unreachable by @borzunov in https://github.com/bigscience-workshop/petals/pull/448
  • Force using --newswarm instead of empty --initialpeers by @borzunov in https://github.com/bigscience-workshop/petals/pull/451
  • Test Llama, rebalancing, throughput eval, and all CLI scripts by @borzunov in https://github.com/bigscience-workshop/petals/pull/452
  • benchmarks: Aggregate speed among workers, set default dtype torch32 by @borzunov in https://github.com/bigscience-workshop/petals/pull/454
  • Use torch.cuda.synchronize for compute throughput by @justheuristic in https://github.com/bigscience-workshop/petals/pull/456
  • Prioritize short inference, unmerge pools for long inference by @borzunov in https://github.com/bigscience-workshop/petals/pull/458
  • Bump version to 2.0.1.post2 by @borzunov in https://github.com/bigscience-workshop/petals/pull/459
  • Add blocked_servers argument by @borzunov in https://github.com/bigscience-workshop/petals/pull/462
  • Add customizable input tensors by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/445
  • Move SequenceManagerConfig -> ClientConfig, petals.dht_utils -> petals.utils.dht by @borzunov in https://github.com/bigscience-workshop/petals/pull/463
  • Make client compatible with transformers' GenerationMixin by @borzunov in https://github.com/bigscience-workshop/petals/pull/464
  • Temporarily require peft<0.5.0, transformers<4.32.0 by @justheuristic in https://github.com/bigscience-workshop/petals/pull/470
  • Support transformers 4.32.x by @justheuristic in https://github.com/bigscience-workshop/petals/pull/471
  • Change transformers version assert by @justheuristic in https://github.com/bigscience-workshop/petals/pull/472
  • Support loading weights from Safetensors on server by @borzunov in https://github.com/bigscience-workshop/petals/pull/473
  • Update peft to 0.5.0 version by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/475
  • Hide excess key message by @borzunov in https://github.com/bigscience-workshop/petals/pull/476
  • Bump version to 2.1.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/474
  • Don't install cpufeature on non-x86_64 machines by @borzunov in https://github.com/bigscience-workshop/petals/pull/478

New Contributors

  • @bot66 made their first contribution in https://github.com/bigscience-workshop/petals/pull/418

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v2.0.1...v2.1.0

- Python
Published by borzunov almost 3 years ago

https://github.com/bigscience-workshop/petals - v2.0.1: Inference of longer sequences, Python 3.11 support, bug fixes

Highlights

๐Ÿ›ฃ๏ธ Inference of longer sequences. We extended the max sequence length to 8192 tokens for Llama 2 and added chunking to avoid server out-of-memory errors (happened when processing long prefixes). This became possible thanks to multi-query attention used in Llama 2, which uses 8x less GPU memory for attention caches. Now you can process longer sequences using a Petals client and have dialogues of up to 8192 tokens at https://chat.petals.dev

๐Ÿ Python 3.11 support. Petals clients and servers now work on Python 3.11.

๐Ÿž Bug fixes. We fixed the server's --token argument (used to provide your ๐Ÿค— Model Hub access token for loading Llama 2), possible deadlocks in the server, issues with fine-tuning speed (servers available via relays are deprioritized) and other minor load balancing issues.

๐ŸชŸ Running server on Windows. We made a better guide for running a server in WSL (Windows Subsystem for Linux).

๐Ÿ“ฆ Running server on Runpod. We added a guide for using a Petals template on Runpod.

What's Changed

  • Update to petals.dev by @justheuristic in https://github.com/bigscience-workshop/petals/pull/390
  • Bump version to 2.0.0.post3 by @borzunov in https://github.com/bigscience-workshop/petals/pull/391
  • Fix --attncachetokens default by @borzunov in https://github.com/bigscience-workshop/petals/pull/392
  • Fix deadlocks in MemoryCache by @borzunov in https://github.com/bigscience-workshop/petals/pull/396
  • Support Python 3.11 by @borzunov in https://github.com/bigscience-workshop/petals/pull/393
  • Fix routing through relay, default network RPS, --token, logging, readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/399
  • If speedtest fails, assume network speed of 100 Mbit/s by @borzunov in https://github.com/bigscience-workshop/petals/pull/404
  • Split long sequences into chunks by @justheuristic in https://github.com/bigscience-workshop/petals/pull/403
  • Add Llama 2, WSL instructions to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/406
  • Update README.md by @borzunov in https://github.com/bigscience-workshop/petals/pull/407
  • Update commands for hosting Llama 2 in readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/409
  • Update --update_period and --expiration defaults by @borzunov in https://github.com/bigscience-workshop/petals/pull/410
  • Bump version to 2.0.1 by @borzunov in https://github.com/bigscience-workshop/petals/pull/411

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v2.0.0.post1...v2.0.1

- Python
Published by borzunov almost 3 years ago

https://github.com/bigscience-workshop/petals - v2.0.0: LLaMA 1 and 2, Guanaco, 4-bit, shortest-path routing, direct server-to-server communication

We're excited to announce Petals 2.0.0 โ€” the largest Petals release to date!

Highlights

๐Ÿฆ™ Support for LLaMA and LLaMA 2. We've added support for inference and fine-tuning of any models based on ๐Ÿค— Transformers LlamaModel, including all variants of LLaMA and LLaMA 2 โ€” one of the strongest open source models available today. The public swarm hosts the largest variants of these models, LLaMA-65B and LLaMA 2 (70B and 70B-Chat), providing inference at the speed of up to 5-6 tokens/sec.

๐Ÿ—œ๏ธ 4-bit quantization. We've integrated efficient 4-bit (NF4) quantization from the recent "QLoRA: Efficient Finetuning of Quantized LLMs" paper. This allows to use ~40% less GPU memory (thus, ~40% less servers) to fit all model blocks and have ~2x speedup for token-by-token inference, compared to the 8-bit quantization we previously used, with relatively small quality loss.

๐Ÿ”Œ Pre-loading LoRA adapters, such as Guanaco. We've also added an opportunity to pre-load LoRA adapters compatible with the ๐Ÿค— PEFT library, which may add extra functionality to the model you host. This adapters are activated at a client's request - specifically, the client may specify .from_pretrained(..., active_adapter="adapter_repo") when loading a distributed model. One example of this is Guanaco - an instruction-finetuned adapter for LLaMA that turns it into a helpful chatbot that carefully follows user's instructions. You can try using LLaMA with this adapter in our chatbot app.

โžก๏ธ Direct server-to-server communication. Previously, servers didn't send tensors to each other directly due to specifics of our fault-tolerant inference algorithm. This update changes that, which saves round-trip time between servers and a client and leads to substantial speedups for clients located far away from servers they're using.

๐Ÿ›ฃ๏ธ Shortest-path routing for inference. Previously, a client didn't properly choose geographically close and fast servers, so the client could choose a slow inference chain, especially if the swarm has many servers located for away from it. Now, the client builds a full graph of client-server and server-server latencies, as well as server inference speeds, to find the fastest chain of servers for inference among all possible ones. It also considers the amount of GPU memory left for attention caches, so that we don't choose a close server that doesn't actually have memory for our request.

๐ŸŒŽ Loading models directly from ๐Ÿค— Model Hub and Auto classes. Starting from Petals 2.0.0, models do not need to be converted to a special format to be hosted by Petals. Instead, both clients and servers can load models directly from ๐Ÿค— Model Hub, fetching only the shards they need to host their part of the model. Furthermore, you can write code supporting multiple architectures at once using Auto classes, such as AutoDistributedConfig.from_pretrained(...) and AutoDistributedModelForCausalLM.from_pretrained(...). The guide for adding new model architectures to Petals also became much simpler due to generalizing Petals code to multiple architectures and the absence of the model conversion step.

๐Ÿ‹๏ธ Fine-tuning examples. We've switched most examples to LLaMA-65B and fixed previously reported bugs. In particular, the "Getting started" notebook now includes a simple example of deep prompt tuning on a dummy task, and the sequence classification notebook uses LLaMA-65B and improved hyperparameters for a stable training.

๐Ÿ–ฅ๏ธ Upgraded swarm monitor. The swarm monitor now contains much more info about the server, including pre-loaded LoRA adapters, detailed performance info, latencies to potential next servers, and so on. All these info is published to DHT, so you don't need to ping each server to fetch it. We've also added a "Contributor" column, so that contributors hosting 10+ blocks get a chance to publish their name, advertise their company or a social media account in exchange to hosting a server for Petals. A name (or a link) shown there may be specified using the server's --public_name argument.

What's Changed

  • Remove unused imports and attributes by @mryab in https://github.com/bigscience-workshop/petals/pull/324
  • Determine block dtype in a unified manner by @mryab in https://github.com/bigscience-workshop/petals/pull/325
  • Use number of tokens for attncachesize by @mryab in https://github.com/bigscience-workshop/petals/pull/286
  • Add LLaMA support by @borzunov in https://github.com/bigscience-workshop/petals/pull/323
  • Add AutoDistributed{Model, ModelForCausalLM, ModelForSequenceClassification} by @borzunov in https://github.com/bigscience-workshop/petals/pull/329
  • Fix llama's lmhead.weight.requiresgrad by @borzunov in https://github.com/bigscience-workshop/petals/pull/330
  • Show license links when loading models by @borzunov in https://github.com/bigscience-workshop/petals/pull/332
  • Add benchmark scripts by @borzunov in https://github.com/bigscience-workshop/petals/pull/319
  • Fix warmup steps and minor issues in benchmarks by @borzunov in https://github.com/bigscience-workshop/petals/pull/334
  • Require pydantic < 2.0 (2.0 is incompatible with hivemind 1.1.8) by @borzunov in https://github.com/bigscience-workshop/petals/pull/337
  • Support loading blocks in 4-bit (QLoRA NF4 format, disabled by default) by @borzunov in https://github.com/bigscience-workshop/petals/pull/333
  • Allow freediskspace_for() remove arbitrary files from Petals cache by @borzunov in https://github.com/bigscience-workshop/petals/pull/339
  • Implement direct server-to-server communication by @borzunov in https://github.com/bigscience-workshop/petals/pull/331
  • Use 4-bit for llama by default, use bitsandbytes 0.40.0.post3 by @borzunov in https://github.com/bigscience-workshop/petals/pull/340
  • Delete deprecated petals.cli scripts by @borzunov in https://github.com/bigscience-workshop/petals/pull/336
  • Use bitsandbytes 0.40.0.post4 with bias hotfix by @borzunov in https://github.com/bigscience-workshop/petals/pull/342
  • Support peft LoRA adapters by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/335
  • Fix convergence issues and switch to LLaMA in the SST-2 example by @mryab in https://github.com/bigscience-workshop/petals/pull/343
  • Mention LLaMA in readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/344
  • Import petals.utils.peft only when needed to avoid unnecessary import of bitsandbytes by @borzunov in https://github.com/bigscience-workshop/petals/pull/345
  • Fix Docker build by avoiding Python 3.11 by @borzunov in https://github.com/bigscience-workshop/petals/pull/348
  • Support LLaMA repos without "-hf" suffix by @borzunov in https://github.com/bigscience-workshop/petals/pull/349
  • Estimate adapter memory overhead in choosenumblocks() by @justheuristic in https://github.com/bigscience-workshop/petals/pull/346
  • Spam less in server logs by @borzunov in https://github.com/bigscience-workshop/petals/pull/350
  • Remove unused import os by @justheuristic in https://github.com/bigscience-workshop/petals/pull/352
  • Test that bitsandbytes is not imported when it's not used by @borzunov in https://github.com/bigscience-workshop/petals/pull/351
  • Fix bugs in choosenum_blocks() added in #346 by @borzunov in https://github.com/bigscience-workshop/petals/pull/354
  • Switch adapters slightly faster by @justheuristic in https://github.com/bigscience-workshop/petals/pull/353
  • Share more info about a server in DHT by @borzunov in https://github.com/bigscience-workshop/petals/pull/355
  • Make a server ping next servers by @borzunov in https://github.com/bigscience-workshop/petals/pull/356
  • Use bitsandbytes 0.40.1.post1 by @borzunov in https://github.com/bigscience-workshop/petals/pull/357
  • Update readme and "Getting started" link by @borzunov in https://github.com/bigscience-workshop/petals/pull/360
  • Report inference, forward, and network RPS separately by @borzunov in https://github.com/bigscience-workshop/petals/pull/358
  • Fix typo in generation_algorithms.py by @eltociear in https://github.com/bigscience-workshop/petals/pull/364
  • Implement shortest-path routing for inference by @borzunov in https://github.com/bigscience-workshop/petals/pull/362
  • Update readme to show new models by @borzunov in https://github.com/bigscience-workshop/petals/pull/365
  • Require transformers < 4.31.0 until we're compatible by @borzunov in https://github.com/bigscience-workshop/petals/pull/369
  • Fix AssertionError on rebalancing by @borzunov in https://github.com/bigscience-workshop/petals/pull/370
  • Update transformers to 4.31.0 and peft to 0.4.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/371
  • Fix readme code example, require Python < 3.11 until supported by @borzunov in https://github.com/bigscience-workshop/petals/pull/374
  • Fix handler memory leak, get rid of mp.Manager by @justheuristic in https://github.com/bigscience-workshop/petals/pull/373
  • Inherit bitsandbytes compute dtype correctly (override peft quirk) by @justheuristic in https://github.com/bigscience-workshop/petals/pull/377
  • Fix --token arg by @borzunov in https://github.com/bigscience-workshop/petals/pull/378
  • Support Llama 2 by @borzunov in https://github.com/bigscience-workshop/petals/pull/379
  • Require accelerate>=0.20.3 as transformers do by @borzunov in https://github.com/bigscience-workshop/petals/pull/383
  • Bump version to 2.0.0.post1 by @borzunov in https://github.com/bigscience-workshop/petals/pull/384

New Contributors

  • @eltociear made their first contribution in https://github.com/bigscience-workshop/petals/pull/364

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v1.1.5...v2.0.0.post1

- Python
Published by borzunov almost 3 years ago

https://github.com/bigscience-workshop/petals - v1.1.5: Faster fine-tuning, bug fixes, and more

Highlights

โฑ Faster fine-tuning. Fine-tuning uses ~2x less traffic (tensors are now sent in bfloat16 by default) and builds routes using a heuristic maximizing the swarm's throughput. This should address timeout errors that could happen during fine-tuning.

๐Ÿž Bug fixes. On servers, this release fixes out-of-memory errors and freezing network throughput evals. On clients, it fixes issues with slicing RemoteSequential and silently ignoring unsupported .generate() kwargs. Also, this release fixes warnings originated from hivemind.p2p and hivemind.compression.

๐Ÿ›ฃ๏ธ Updated throughput formula. We have updated the throughput formula to reflect that servers hosting many blocks still run forward and backward passes through only one block at a time. Don't be surprised if your throughput became smaller than in 1.1.4 โ€” these numbers are not directly comparable!

๐Ÿ–ผ๏ธ Improved lower-level interfaces. We have refactored lower-level interfaces, such as RemoteSequential and RemoteSequenceManager, to be more reliable (e.g. when doing retries) and much easier to use. Some rarely used low-level functions in petals.dht_utils were removed.

What's Changed

  • Fix OOMs happening in case of accelerate >= 0.16.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/310
  • Refactor RemoteSequenceManager by @borzunov in https://github.com/bigscience-workshop/petals/pull/309
  • Update hivemind to 1.1.8, enable efficient bfloat16 encoding by @borzunov in https://github.com/bigscience-workshop/petals/pull/311
  • Replace .makesequence(..., mode="random") with mode="maxthroughput" by @borzunov in https://github.com/bigscience-workshop/petals/pull/313
  • Divide compute throughput by average no. of used blocks by @borzunov in https://github.com/bigscience-workshop/petals/pull/314
  • Raise error for unexpected .generate() kwargs by @borzunov in https://github.com/bigscience-workshop/petals/pull/315
  • Abort speedtest if it runs too long by @borzunov in https://github.com/bigscience-workshop/petals/pull/316
  • Bump version to 1.1.5 by @borzunov in https://github.com/bigscience-workshop/petals/pull/312

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v1.1.4...v1.1.5

- Python
Published by borzunov about 3 years ago

https://github.com/bigscience-workshop/petals - v1.1.4: Extended GPU support, faster startup, and more

Highlights

๐Ÿ—๏ธ 8-bit servers support more GPUs. A bitsandbytes update brings 8-bit support to older generations of NVIDIA GPUs, as well as the GeForce 16 GPU series (e.g. 1660 Ti). Please try Petals 1.1.4 if you previously had errors like Your GPU does not support Int8 Matmul! and cublasLt ran into an error! on some GPUs. This version also loads weights in 8-bit by default when tensor parallelism is enabled.

โฑ๏ธ Servers start faster. Servers take ~2x less time to load block weights from the disk cache to the GPU memory. The next release will also reduce the time it takes to download the weights from the Internet, since they will be downloaded in 8-bit instead of 16-bit.

๐Ÿงต Multi-threaded clients work faster. Earlier, multi-threaded clients were actually performing only one network request at a time due to a bug in hivemind. This bug was recently fixed in hivemind. This significantly improves the speed of the chat.petals.ml app when multiple users chat concurrently.

โฑ๏ธ Clients start faster. Clients take ~10% less time to load the model, since they build a route through remote servers in parallel with loading the local part of the model (input/output embeddings).

๐ŸŒณ Relaxed dependency requirements. We relaxed version requirements for transformers and other huggingface libraries, so you can update them independently of Petals. In particular, Petals works with PyTorch 2.0 and the latest transformers release. Also, we fixed a bug where the client loaded a model in float32 by default (instead of bfloat16/float16) in some transformers releases. Please try Petals 1.1.4 if you previously had out-of-memory errors when running the client.

What's Changed

  • Speed up loading blocks using init with meta weights by @mryab in https://github.com/bigscience-workshop/petals/pull/285
  • Add benchmarks to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/284
  • Fix invalid author email in setup.cfg by @borzunov in https://github.com/bigscience-workshop/petals/pull/287
  • Hotfix: Increase daemonstartuptimeout by @borzunov in https://github.com/bigscience-workshop/petals/pull/292
  • Update bitsandbytes, hivemind, transformers by @justheuristic in https://github.com/bigscience-workshop/petals/pull/290
  • Fix deps, enable 8-bit by default for TP by @borzunov in https://github.com/bigscience-workshop/petals/pull/298
  • Add Python 3.10 to CI by @borzunov in https://github.com/bigscience-workshop/petals/pull/299
  • Remove CustomLinear8bitLt by @borzunov in https://github.com/bigscience-workshop/petals/pull/297
  • Remove useautorelay=True in client by @borzunov in https://github.com/bigscience-workshop/petals/pull/300
  • Start SequenceManager's thread only after first .make_sequence() by @borzunov in https://github.com/bigscience-workshop/petals/pull/301
  • Require bitsandbytes == 0.38.0.post2, hivemind == 1.1.7 by @borzunov in https://github.com/bigscience-workshop/petals/pull/302
  • Suggest commands for Docker first by @borzunov in https://github.com/bigscience-workshop/petals/pull/304
  • Relax the rest of Hugging Face dependencies by @borzunov in https://github.com/bigscience-workshop/petals/pull/305
  • Force transformers to use config.torch_dtype by default by @borzunov in https://github.com/bigscience-workshop/petals/pull/307
  • Bump version to 1.1.4 by @borzunov in https://github.com/bigscience-workshop/petals/pull/306

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v1.1.3...v1.1.4

- Python
Published by borzunov about 3 years ago

https://github.com/bigscience-workshop/petals - v1.1.3: Bug fixes

Highlights

๐Ÿž Bug fixes. We have fixed a variety of minor issues related to timeout errors in the client, fine-tuning, and tensor parallelism.

โš™๏ธ New options in the client. Added allowed_servers and max_retries options:

  • allowed_servers allows to restrict the set of servers a client can use for its requests (e.g., to only use the servers trusted to process your data).
  • max_retries allows to limit the number of retries a client does before raising an exception (previously, clients continued retrying indefinitely).

๐Ÿ“š FAQ. We have released the FAQ page that covers common questions about running clients and servers, as well as troubleshooting common problems.

What's Changed

  • Fix typo in prompt-tuning-sst2.ipynb by @borzunov in https://github.com/bigscience-workshop/petals/pull/245
  • Minor changes to examples/prompt-tuning notebooks by @justheuristic in https://github.com/bigscience-workshop/petals/pull/247
  • Fix examples/sst, add cls_model embeddings by @justheuristic in https://github.com/bigscience-workshop/petals/pull/248
  • Fix TP crashing when hypo_ids are used by @borzunov in https://github.com/bigscience-workshop/petals/pull/249
  • Add allowed_servers, max_retries options to the client, improve logs by @borzunov in https://github.com/bigscience-workshop/petals/pull/235
  • Lower payload size threshold for stream handlers by @borzunov in https://github.com/bigscience-workshop/petals/pull/251
  • Improve reachability logs by @borzunov in https://github.com/bigscience-workshop/petals/pull/253
  • Link FAQ in readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/260
  • Show visible maddrs for public swarm too by @borzunov in https://github.com/bigscience-workshop/petals/pull/263
  • Limit max delay between retries to 15 min by @borzunov in https://github.com/bigscience-workshop/petals/pull/264
  • Use getlogger(name) instead of getlogger(file) by @borzunov in https://github.com/bigscience-workshop/petals/pull/265
  • Improve "connect your GPU" message by @borzunov in https://github.com/bigscience-workshop/petals/pull/266
  • Fix usechunkedforward="auto" on non-x86_64 machines by @borzunov in https://github.com/bigscience-workshop/petals/pull/267
  • Use inference mode in _MergedInferenceStep by @justheuristic in https://github.com/bigscience-workshop/petals/pull/275
  • Increase default request_timeout by @borzunov in https://github.com/bigscience-workshop/petals/pull/276

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v1.1.2...v1.1.3

- Python
Published by borzunov over 3 years ago

https://github.com/bigscience-workshop/petals - v1.1.2: Faster inference, new model, and more

Highlights

๐Ÿƒโ€โ™€๏ธ Faster inference. We've shipped server-side changes improving the inference speed by up to 30%. This is a result of profiling the server's inference performance (see details in #224 and #225). The public swarm will become faster once everyone upgrades to the latest Petals version and restarts their servers.

๐Ÿž Prompt-tuning bug fixes. We've shipped bug fixes for prompt-tuning notebooks (see details in #231).

๐Ÿง‘โ€๐Ÿซ New pretrained model. We've added a new model, BLOOMZ-176B by BigScience, to the public swarm. You can run it (or host its blocks) by specifying bigscience/bloomz-petals as the model name.

  • BLOOMZ is a version of BLOOM fine-tuned to follow human instructions in the zero-shot regime. See details in its model card and paper.
  • The chatbot app now uses BLOOMZ by default. You can ask it to generate texts, code, or perform various tasks. It responds better than the regular BLOOM, which often went off-topic instead of actually doing the task you asked.

What's Changed

  • Choose --num_blocks automatically for all models by @borzunov in https://github.com/bigscience-workshop/petals/pull/217
  • Add one more link to the "Getting started" tutorial by @borzunov in https://github.com/bigscience-workshop/petals/pull/218
  • Mention BLOOMZ in readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/221
  • Fix a typo in error message. by @zsc in https://github.com/bigscience-workshop/petals/pull/227
  • Merge inference pools into one to increase inference speed by @justheuristic in https://github.com/bigscience-workshop/petals/pull/225
  • Add citation to readme by @Muhtasham in https://github.com/bigscience-workshop/petals/pull/219
  • Fix dtype error in fine-tuning notebooks by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/231
  • Prompt-tuning notebooks: suggest to use a smaller model for faster prototyping by @borzunov in https://github.com/bigscience-workshop/petals/pull/234
  • Bump version to 1.1.2 by @borzunov in https://github.com/bigscience-workshop/petals/pull/244

New Contributors

  • @zsc made their first contribution in https://github.com/bigscience-workshop/petals/pull/227
  • @Muhtasham made their first contribution in https://github.com/bigscience-workshop/petals/pull/219

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v1.1.1...v1.1.2

- Python
Published by borzunov over 3 years ago

https://github.com/bigscience-workshop/petals - v1.1.1: More stable and fast

Highlights

โ›ฐ๏ธ Stability. This release improves stability and performance of the Petals DHT in presence of many servers joined via NAT traversal & relays. Now, the DHT prefers to store the keys on directly reachable peers, so that all peers can access them faster and with less failures. Also, this release contains a minor fix to the block reassignment algorithm that decreases excess reassignments that were leading to the swarm downtime in the past.

๐ŸŒŽ Basic routing. We have improved the routing algorithm for inference, so that clients weakly prefer servers holding more blocks to minimize latency and increase inference speed. This is only a basic algorithm, and we are working on smarter routing (taking into account latency, throughput, etc.) for both inference and fine-tuning in future releases. This release also makes the servers share more technical information about themselves (their version, free cache, etc.), so it can be used by the smarter routing algorithms in future and shown at http://health.petals.ml for debugging purposes.

What's Changed

  • Fix fine-tuning notebooks intros by @borzunov in https://github.com/bigscience-workshop/petals/pull/194
  • Ignore network RPS if we failed to measure it by @borzunov in https://github.com/bigscience-workshop/petals/pull/198
  • Make client ignore blacklist if all servers holding a block are blacklisted by @borzunov in https://github.com/bigscience-workshop/petals/pull/197
  • Increase tolerances in testtpblock by @justheuristic in https://github.com/bigscience-workshop/petals/pull/196
  • Fix --noautorelay help by @borzunov in https://github.com/bigscience-workshop/petals/pull/199
  • Use length-weighted sampling in routing for inference by @justheuristic in https://github.com/bigscience-workshop/petals/pull/204
  • Return available cache size in rpc_info() by @justheuristic in https://github.com/bigscience-workshop/petals/pull/191
  • Add service checking direct reachability from peers by @justheuristic in https://github.com/bigscience-workshop/petals/pull/195
  • Report server version and dht.clientmode in rpcinfo(), check for updates on startup by @borzunov in https://github.com/bigscience-workshop/petals/pull/209
  • Don't switch blocks if it makes swarm disjoint by @borzunov in https://github.com/bigscience-workshop/petals/pull/210
  • Fix output shape when resuming generation by @borzunov in https://github.com/bigscience-workshop/petals/pull/211
  • Improve errors in case of missing blocks, suggest to join your own server by @borzunov in https://github.com/bigscience-workshop/petals/pull/212
  • CI: Convert model only when convert_model.py or setup.cfg change by @borzunov in https://github.com/bigscience-workshop/petals/pull/213
  • CI: Update deprecated actions, don't measure network RPS by @borzunov in https://github.com/bigscience-workshop/petals/pull/215
  • Bump version to 1.1.1 by @borzunov in https://github.com/bigscience-workshop/petals/pull/214

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v1.1.0...v1.1.1

- Python
Published by borzunov over 3 years ago

https://github.com/bigscience-workshop/petals - v1.1.0: NAT traversal, relays, and more

Highlights

๐Ÿ  NAT traversal & relays. Now, servers can join the swarm automatically even if your machine is located behind a NAT or a firewall, or has a dynamic IP address. You don't have to manually set up port forwarding or provide any arguments to make it work.

  • Please upgrade the Petals package and restart all your servers & clients to use this feature or access servers joined via relays:

    pip install --upgrade petals

  • How does it work? If the server learns that it can't accept incoming connections due to NAT/firewall, it opens a long-term outcoming connection to one of relay nodes, then the relay node forwards all requests to this server through this connection. In turn, any server with a public IP may serve as a relay node if necessary. We use libp2p circuit relays under the hood: https://docs.libp2p.io/concepts/nat/circuit-relay/

๐Ÿ’ฌ Chatbot app. We've released a chatbot app working over Petals: http://chat.petals.ml (source code).

  • Disclaimer: This chatbot uses the regular BLOOM, which is not fine-tuned for question answering. Please do not expect it to behave like ChatGPT.

  • How does it work? Under the hood, this web app uses our HTTP endpoint for running inference using the public Petals swarm. You can use this endpoint for your own projects, or set up another endpoint yourself (no GPU needed). See API docs here: https://github.com/borzunov/chat.petals.ml#http-api-methods

๐Ÿƒโ€โ™€๏ธ Faster CPU-only clients. If your CPU supports the AVX512 instruction set, a CPU-only client now runs almost as fast as a GPU-enabled one. This way, you can rent cheap CPU instances to run the client or an HTTP endpoint, like the one we use for the chatbot app.

  • How to use it? AVX512 is mostly present on late Intel Xeon CPUs. You can rent one by choosing a "dedicated CPU" instance with 16+ GB RAM on DigitalOcean.

๐Ÿฅ Swarm health monitor. We've updated the swarm health monitor: http://health.petals.ml (source code). It provides an overview of servers who joined the public swarm and reports any connection issues.

What's Changed

  • Add PyPI badge, update instructions and links in readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/172
  • Add link to PyPI by @borzunov in https://github.com/bigscience-workshop/petals/pull/173
  • Add local tensor-parallel fwd/bwd by @justheuristic in https://github.com/bigscience-workshop/petals/pull/143
  • Make Docker command more visible by @borzunov in https://github.com/bigscience-workshop/petals/pull/175
  • Allow to disable chunked forward by @borzunov in https://github.com/bigscience-workshop/petals/pull/176
  • Disable chunked_forward() on AVX512 CPUs by @borzunov in https://github.com/bigscience-workshop/petals/pull/179
  • Use slightly less memory in .generate() by @borzunov in https://github.com/bigscience-workshop/petals/pull/177
  • Import bitsandbytes only if it's going to be used by @borzunov in https://github.com/bigscience-workshop/petals/pull/180
  • hotfix: add initial peer that did not crash :) by @justheuristic in https://github.com/bigscience-workshop/petals/pull/181
  • Remove protobuf from requirements by @borzunov in https://github.com/bigscience-workshop/petals/pull/182
  • Add more links to BLOOM to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/183
  • Add link to health.petals.ml to readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/184
  • Add readme subsections by @borzunov in https://github.com/bigscience-workshop/petals/pull/185
  • Fix GiBs in the "insufficient disk space" message by @borzunov in https://github.com/bigscience-workshop/petals/pull/187
  • Support libp2p relays for NAT traversal by @Vahe1994 in https://github.com/bigscience-workshop/petals/pull/186
  • Fix psutil-related AccessDenied crash, disable --loadin8bit by default in case of TP by @borzunov in https://github.com/bigscience-workshop/petals/pull/188
  • Bump version to 1.1.0 by @borzunov in https://github.com/bigscience-workshop/petals/pull/190

New Contributors

  • @Vahe1994 made their first contribution in https://github.com/bigscience-workshop/petals/pull/186

Full Changelog: https://github.com/bigscience-workshop/petals/compare/v1.0.0...v1.1.0

- Python
Published by borzunov over 3 years ago

https://github.com/bigscience-workshop/petals - v1.0.0: The first stable release

General

This release contains the core functionality of the Petals platform described in our paper.

What's Changed

  • Rudimentary decentralization by @justheuristic in https://github.com/bigscience-workshop/petals/pull/9
  • Update model by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/17
  • Chained rpcforward & rpcbackward by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/18
  • Implement block selection on servers by @borzunov in https://github.com/bigscience-workshop/petals/pull/20
  • LM head module by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/19
  • Measure and cache network & compute throughput by @borzunov in https://github.com/bigscience-workshop/petals/pull/21
  • Shallow prompt tuning with run example on SST-2 by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/22
  • minimalistic automated tests by @justheuristic in https://github.com/bigscience-workshop/petals/pull/23
  • Clean up readme by @justheuristic in https://github.com/bigscience-workshop/petals/pull/24
  • [Test CI] add instructions to test the full model by @justheuristic in https://github.com/bigscience-workshop/petals/pull/25
  • Fix default branch in CI by @justheuristic in https://github.com/bigscience-workshop/petals/pull/26
  • Fix CI runs in master by @justheuristic in https://github.com/bigscience-workshop/petals/pull/27
  • CI: use GITREFNAME instead of GITHEADREF by @justheuristic in https://github.com/bigscience-workshop/petals/pull/28
  • Add GenerationMixin class by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/29
  • Decouple make_sequence and move to RemoteSequenceManager by @justheuristic in https://github.com/bigscience-workshop/petals/pull/30
  • fix is_subsequence by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/32
  • Miscellaneous fixes to automatic tests by @justheuristic in https://github.com/bigscience-workshop/petals/pull/35
  • Efficient forward & backward by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/36
  • Pack of Inference Changes by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/37
  • Support various backend dtypes & async serialization by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/38
  • Use "PETALS" as the readme title by @borzunov in https://github.com/bigscience-workshop/petals/pull/40
  • integrate mixed-8bit model by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/39
  • Rename 350m -> 560m by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/43
  • make pytest outputs more verbose by @justheuristic in https://github.com/bigscience-workshop/petals/pull/44
  • Distributed prompt tuning by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/42
  • Reduce vocabulary size in test model, fix bug in routing when overlapped by @justheuristic in https://github.com/bigscience-workshop/petals/pull/45
  • Convert actual model weights by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/46
  • [quickfix 1/n] remove expensive assertions in inference code by @justheuristic in https://github.com/bigscience-workshop/petals/pull/48
  • [Fix] make distributed seq cls to not create the full bloom model by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/49
  • Fix recovering for sequential_backward by @dbaranchuk in https://github.com/bigscience-workshop/petals/pull/50
  • Inference: require max sequence length instead of assuming 2048 by @justheuristic in https://github.com/bigscience-workshop/petals/pull/52
  • Add shallow prefix-tuned inference by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/55
  • remove transformer block, implement as sequence size 1 by @GreenFatGuy in https://github.com/bigscience-workshop/petals/pull/54
  • Update readme for the 1st public release by @borzunov in https://github.com/bigscience-workshop/petals/pull/57
  • Use latest version of Petals scheme, shrink Petals logo by @borzunov in https://github.com/bigscience-workshop/petals/pull/59
  • Update bullet points with feedback from Tim and other people by @borzunov in https://github.com/bigscience-workshop/petals/pull/61
  • Update readme with arxiv link and more discussions by @borzunov in https://github.com/bigscience-workshop/petals/pull/62
  • Warn that current instructions involve 6B model but we will replace them soon by @borzunov in https://github.com/bigscience-workshop/petals/pull/63
  • Add deep prompt inference by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/66
  • Fix calling rpc_info multiple times by @justheuristic in https://github.com/bigscience-workshop/petals/pull/60
  • Make attention cache wait until memory is freed by @justheuristic in https://github.com/bigscience-workshop/petals/pull/53
  • Build cpuonly from bitsandbytes main by @justheuristic in https://github.com/bigscience-workshop/petals/pull/70
  • Priority tasks by @GreenFatGuy in https://github.com/bigscience-workshop/petals/pull/47
  • Update dependency versions by @justheuristic in https://github.com/bigscience-workshop/petals/pull/71
  • fix protobuf version by @justheuristic in https://github.com/bigscience-workshop/petals/pull/74
  • Add prompt tuning example on Personachat dataset by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/69
  • Quality of life changes: update readme, simplify run_server interface by @justheuristic in https://github.com/bigscience-workshop/petals/pull/75
  • Use bitsandbytes==0.34.0, update readme by @justheuristic in https://github.com/bigscience-workshop/petals/pull/76
  • Make small readability & style changes to the instructions by @borzunov in https://github.com/bigscience-workshop/petals/pull/77
  • Rebalance swarm when necessary by @borzunov in https://github.com/bigscience-workshop/petals/pull/34
  • Update hivemind to 1.1.2, mark model argument as required by @borzunov in https://github.com/bigscience-workshop/petals/pull/81
  • Fix "Too many open files" during rebalancing by @borzunov in https://github.com/bigscience-workshop/petals/pull/83
  • Add colab-related changes by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/80
  • Enable rebalancing by default by @borzunov in https://github.com/bigscience-workshop/petals/pull/84
  • Implement exponential backoff for forward & backward by @borzunov in https://github.com/bigscience-workshop/petals/pull/85
  • Add sst-2 ipynb example by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/86
  • Fix floating point issues in block_selection.py by @borzunov in https://github.com/bigscience-workshop/petals/pull/89
  • Implement timeouts in forward/backward by @borzunov in https://github.com/bigscience-workshop/petals/pull/90
  • Force reinstall of hivemind in example notebooks by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/88
  • Make inference, forward, and backward fully fault-tolerant by @borzunov in https://github.com/bigscience-workshop/petals/pull/91
  • Use public swarm by default by @borzunov in https://github.com/bigscience-workshop/petals/pull/92
  • Make ServerState announcements work better by @borzunov in https://github.com/bigscience-workshop/petals/pull/93
  • Require hivemind with fixed compression and protobuf working on Colab by @borzunov in https://github.com/bigscience-workshop/petals/pull/94
  • Try to fix protobuf versions once again by @borzunov in https://github.com/bigscience-workshop/petals/pull/95
  • Add Beam Search decoding algorithm by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/87
  • Improve server's logging by @borzunov in https://github.com/bigscience-workshop/petals/pull/96
  • Add various server timeouts, lower --maxbatchsize and --inferencemaxlength defaults by @borzunov in https://github.com/bigscience-workshop/petals/pull/97
  • Fix dtype- and device-related client issues by @borzunov in https://github.com/bigscience-workshop/petals/pull/98
  • Make Petals a pip-installable package (attempt 2) by @borzunov in https://github.com/bigscience-workshop/petals/pull/102
  • Fix dtypes in backend schemas by @borzunov in https://github.com/bigscience-workshop/petals/pull/99
  • Fix ptune with low_cpu_mem_usage=True (as in Colab) by @borzunov in https://github.com/bigscience-workshop/petals/pull/103
  • Add Dockerfile by @mryab in https://github.com/bigscience-workshop/petals/pull/82
  • Remove unused imports, add missing arguments to docstrings by @mryab in https://github.com/bigscience-workshop/petals/pull/108
  • Expose request_timeout to DistributedBloomConfig by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/105
  • Optimize RemoteSequenceManager by @justheuristic in https://github.com/bigscience-workshop/petals/pull/106
  • Hotfix span selection by @justheuristic in https://github.com/bigscience-workshop/petals/pull/110
  • Patch Linear8bit to enable CxB backward by @justheuristic in https://github.com/bigscience-workshop/petals/pull/111
  • Fix Linear8bitlt state config, update tests by @justheuristic in https://github.com/bigscience-workshop/petals/pull/112
  • Measure throughput for different configs, devices, and dtypes separately by @borzunov in https://github.com/bigscience-workshop/petals/pull/114
  • Support --loadin8bit on pre-Turing GPUs by @justheuristic in https://github.com/bigscience-workshop/petals/pull/113
  • Fix tile size on ampere by @justheuristic in https://github.com/bigscience-workshop/petals/pull/116
  • Make server use smart defaults by @borzunov in https://github.com/bigscience-workshop/petals/pull/115
  • Suppress quantization warning and fix dtype defaults in compute benchmark by @borzunov in https://github.com/bigscience-workshop/petals/pull/117
  • Choose --num_blocks for bigscience/bloom-petals automatically by @borzunov in https://github.com/bigscience-workshop/petals/pull/119
  • Require hivemind==1.1.4 with p2pd v0.3.13 by @borzunov in https://github.com/bigscience-workshop/petals/pull/121
  • Rework readme, move code example to the top, link draft of Colab by @borzunov in https://github.com/bigscience-workshop/petals/pull/118
  • Remove "-r" when installing Petals in examples by @mryab in https://github.com/bigscience-workshop/petals/pull/122
  • Update notebooks to use full BLOOM-176B by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/104
  • Call block.loadstatedict only once by @mryab in https://github.com/bigscience-workshop/petals/pull/124
  • Add checks for forward() inputs on the client side by @justheuristic in https://github.com/bigscience-workshop/petals/pull/123
  • Fix typos with codespell by @mryab in https://github.com/bigscience-workshop/petals/pull/126
  • Set dht.numworkers = nlayer, update_period = 150, expiration = 300 by @borzunov in https://github.com/bigscience-workshop/petals/pull/125
  • Avoid synchronous updates, ban peers based on request outcome by @justheuristic in https://github.com/bigscience-workshop/petals/pull/127
  • Revert to hivemind==1.1.3 for stability by @borzunov in https://github.com/bigscience-workshop/petals/pull/129
  • Clear trigger before engaging in update by @justheuristic in https://github.com/bigscience-workshop/petals/pull/130
  • Fix inference and rpc_info() fault tolerance by @borzunov in https://github.com/bigscience-workshop/petals/pull/131
  • Set default --step_timeout to 5 min by @borzunov in https://github.com/bigscience-workshop/petals/pull/133
  • Don't ban servers in case of client-caused handler errors by @borzunov in https://github.com/bigscience-workshop/petals/pull/134
  • Allow .generate() to reuse existing inference session by @borzunov in https://github.com/bigscience-workshop/petals/pull/132
  • Fix waiting until free memory is available by @borzunov in https://github.com/bigscience-workshop/petals/pull/136
  • Fix "could not unlink the shared memory file" during rebalancing by @borzunov in https://github.com/bigscience-workshop/petals/pull/135
  • Add Docker commands, use permanent Discord links by @borzunov in https://github.com/bigscience-workshop/petals/pull/137
  • Update texts in "Terms of use" and "Privacy and security" sections by @borzunov in https://github.com/bigscience-workshop/petals/pull/138
  • Show route on client by @borzunov in https://github.com/bigscience-workshop/petals/pull/139
  • Update Anaconda instructions by @borzunov in https://github.com/bigscience-workshop/petals/pull/140
  • Use common folder for all caches, make it a volume in Dockerfile by @borzunov in https://github.com/bigscience-workshop/petals/pull/141
  • Suppress asyncio error logs by default by @borzunov in https://github.com/bigscience-workshop/petals/pull/142
  • Add link to privacy & security Wiki by @borzunov in https://github.com/bigscience-workshop/petals/pull/144
  • Improve block size calculations by @borzunov in https://github.com/bigscience-workshop/petals/pull/149
  • Fix OOMs during server rebalancing by @borzunov in https://github.com/bigscience-workshop/petals/pull/150
  • Bump transformers to 4.25.1 by @justheuristic in https://github.com/bigscience-workshop/petals/pull/151
  • Clean up disk space by @borzunov in https://github.com/bigscience-workshop/petals/pull/152
  • Fix arguments in removeoldmodels.py by @mryab in https://github.com/bigscience-workshop/petals/pull/153
  • Add missing methods for SamplingAlgorithm, fix docstrings by @mryab in https://github.com/bigscience-workshop/petals/pull/107
  • Reset MemoryCache during rebalancings by @borzunov in https://github.com/bigscience-workshop/petals/pull/154
  • Check reachability automatically and give advice how to fix it by @borzunov in https://github.com/bigscience-workshop/petals/pull/155
  • Fix logging: do not duplicate lines, enable colors in Colab by @borzunov in https://github.com/bigscience-workshop/petals/pull/156
  • Update advanced notebooks by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/148
  • Downgrade CUDA in Docker image to 11.0.3 by @mryab in https://github.com/bigscience-workshop/petals/pull/145
  • Switch to speedtest-cli by @justheuristic in https://github.com/bigscience-workshop/petals/pull/157
  • Fix issues related to petals as a module by @borzunov in https://github.com/bigscience-workshop/petals/pull/159
  • Alloc inference cache as one contiguous buffer by @borzunov in https://github.com/bigscience-workshop/petals/pull/160
  • Fix misstypos in the example notebooks. by @artek0chumak in https://github.com/bigscience-workshop/petals/pull/161
  • Hot fix: Increase hivemind.P2P's startup_timeout for Colab, remove absent initial peer by @borzunov in https://github.com/bigscience-workshop/petals/pull/162
  • Shield alloc & free from cancellation by @borzunov in https://github.com/bigscience-workshop/petals/pull/163
  • Update wording in readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/165
  • Correct grammar in readme by @vadi2 in https://github.com/bigscience-workshop/petals/pull/166
  • Add link to chat.petals.ml by @borzunov in https://github.com/bigscience-workshop/petals/pull/168
  • Fix code example in readme by @borzunov in https://github.com/bigscience-workshop/petals/pull/169
  • Fix instruction for developers by @justheuristic in https://github.com/bigscience-workshop/petals/pull/170

New Contributors

  • @dbaranchuk made their first contribution in https://github.com/bigscience-workshop/petals/pull/17
  • @borzunov made their first contribution in https://github.com/bigscience-workshop/petals/pull/20
  • @artek0chumak made their first contribution in https://github.com/bigscience-workshop/petals/pull/29
  • @GreenFatGuy made their first contribution in https://github.com/bigscience-workshop/petals/pull/54
  • @mryab made their first contribution in https://github.com/bigscience-workshop/petals/pull/82
  • @vadi2 made their first contribution in https://github.com/bigscience-workshop/petals/pull/166

Full Changelog: https://github.com/bigscience-workshop/petals/commits/v1.0.0

- Python
Published by borzunov over 3 years ago