https://github.com/alpa-projects/alpa

Training and serving large-scale neural networks with auto parallelization.

https://github.com/alpa-projects/alpa

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    8 of 52 committers (15.4%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.0%) to scientific vocabulary

Keywords

alpa auto-parallelization compiler deep-learning distributed-computing distributed-training high-performance-computing jax llm machine-learning

Keywords from Contributors

distributed hyperparameter-optimization llm-inference llm-serving parallel rllib serving reinforcement-learning deployment large-language-models
Last synced: 5 months ago · JSON representation

Repository

Training and serving large-scale neural networks with auto parallelization.

Basic Info
  • Host: GitHub
  • Owner: alpa-projects
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://alpa.ai
  • Size: 7.11 MB
Statistics
  • Stars: 3,153
  • Watchers: 46
  • Forks: 355
  • Open Issues: 77
  • Releases: 13
Archived
Topics
alpa auto-parallelization compiler deep-learning distributed-computing distributed-training high-performance-computing jax llm machine-learning
Created almost 5 years ago · Last pushed about 2 years ago
Metadata Files
Readme License

README.md

Note: Alpa is not actively maintained currently. It is available as a research artifact. The core algorithm in Alpa has been merged into XLA, which is still being maintained. https://github.com/openxla/xla/tree/main/xla/hlo/experimental/auto_sharding

logo

CI Build Jaxlib

Documentation | Slack

Alpa is a system for training and serving large-scale neural networks.

Scaling neural networks to hundreds of billions of parameters has enabled dramatic breakthroughs such as GPT-3, but training and serving these large-scale neural networks require complicated distributed system techniques. Alpa aims to automate large-scale distributed training and serving with just a few lines of code.

The key features of Alpa include:

💻 Automatic Parallelization. Alpa automatically parallelizes users' single-device code on distributed clusters with data, operator, and pipeline parallelism.

🚀 Excellent Performance. Alpa achieves linear scaling on training models with billions of parameters on distributed clusters.

Tight Integration with Machine Learning Ecosystem. Alpa is backed by open-source, high-performance, and production-ready libraries such as Jax, XLA, and Ray.

Serving

The code below shows how to use huggingface/transformers interface and Alpa distributed backend for large model inference. Detailed documentation is in Serving OPT-175B using Alpa.

```python from transformers import AutoTokenizer from llmserving.model.wrapper import getmodel

Load the tokenizer

tokenizer = AutoTokenizer.frompretrained("facebook/opt-2.7b") tokenizer.addbos_token = False

Load the model. Alpa automatically downloads the weights to the specificed path

model = getmodel(modelname="alpa/opt-2.7b", path="~/opt_weights/")

Generate

prompt = "Paris is the capital city of"

inputids = tokenizer(prompt, returntensors="pt").inputids output = model.generate(inputids=inputids, maxlength=256, dosample=True) generatedstring = tokenizer.batchdecode(output, skipspecial_tokens=True)

print(generated_string) ```

Training

Use Alpa's decorator @parallelize to scale your single-device training code to distributed clusters. Check out the documentation site and examples folder for installation instructions, tutorials, examples, and more.

```python import alpa

Parallelize the training step in Jax by simply using a decorator

@alpa.parallelize def trainstep(modelstate, batch): def lossfunc(params): out = modelstate.forward(params, batch["x"]) return jnp.mean((out - batch["y"]) ** 2)

grads = grad(loss_func)(model_state.params)
new_model_state = model_state.apply_gradient(grads)
return new_model_state

The training loop now automatically runs on your designated cluster

modelstate = createtrainstate() for batch in dataloader: modelstate = trainstep(model_state, batch) ```

Learning more

Getting Involved

License

Alpa is licensed under the Apache-2.0 license.

Owner

  • Name: Alpa
  • Login: alpa-projects
  • Kind: organization
  • Location: United States of America

Distributed training of large-scale deep learning models

GitHub Events

Total
  • Issues event: 1
  • Watch event: 104
  • Fork event: 19
Last Year
  • Issues event: 1
  • Watch event: 104
  • Fork event: 19

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 666
  • Total Committers: 52
  • Avg Commits per committer: 12.808
  • Development Distribution Score (DDS): 0.553
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Lianmin Zheng l****g@g****m 298
Hao Zhang z****g 103
Yonghao Zhuang z****h@s****n 77
Zhuohan Li z****3@g****m 56
Zhuohan Li z****3@v****m 22
flyingpig 6****g 13
Yonghao Zhuang y****g@c****u 13
yf225 w****g@f****m 9
Jimmy Yao j****h@g****m 7
ddxxdd-code 6****e 5
Zhanyuan Zhang 3****b 4
Cody Yu c****2@g****m 4
Hexu Zhao 4****o 3
Jun Gong j****g@a****m 3
Jiao s****s@g****m 3
Woosuk Kwon w****n@b****u 2
Vincent Liu l****v@s****u 2
KoyamaSohei k****9@g****m 2
Jiao j****g@a****m 2
Jerry Ding j****y@g****m 2
Blair Johnson 4****n 2
Christopher Chou 4****r 2
Ikko Ashimine e****r@g****m 2
dumpmemory 6****y 2
Gerald Shen 1****m 1
Lufang Chen 6****c 1
RichardScottOZ 7****Z 1
Ziwen 6****n 1
sammeralomair 6****r 1
wgimperial 3****l 1
and 22 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 94
  • Total pull requests: 51
  • Average time to close issues: 25 days
  • Average time to close pull requests: 10 days
  • Total issue authors: 63
  • Total pull request authors: 30
  • Average comments per issue: 2.62
  • Average comments per pull request: 0.61
  • Merged pull requests: 39
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ZYHowell (4)
  • jaywonchung (4)
  • caixiiaoyang (3)
  • zhanyuanucb (3)
  • samblouir (3)
  • lambda7xx (3)
  • merrymercy (3)
  • chaokunyang (3)
  • frankxyy (2)
  • zigzagcai (2)
  • LeiWang1999 (2)
  • EntilZha (2)
  • GHGmc2 (2)
  • gjoliver (2)
  • JingfengYang (2)
Pull Request Authors
  • ZYHowell (6)
  • merrymercy (5)
  • gjoliver (4)
  • zhisbug (3)
  • yhtang (3)
  • dlzou (2)
  • JubilantJerry (2)
  • KoyamaSohei (2)
  • Vatshank (1)
  • Spina7demon (1)
  • jiaodong (1)
  • FedericoCampe8 (1)
  • eltociear (1)
  • AetherPrior (1)
  • haifaksh (1)
Top Labels
Issue Labels
good first issue (10) enhancement (5) known bug (3) unknown error (3) help wanted (2)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 214 last-month
  • Total docker downloads: 407
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 26
  • Total maintainers: 1
pypi.org: alpa

Alpa automatically parallelizes large tensor computation graphs and runs them on a distributed cluster.

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 214 Last month
  • Docker Downloads: 407
Rankings
Stargazers count: 1.4%
Docker downloads count: 2.8%
Forks count: 2.8%
Average: 8.6%
Dependent packages count: 10.1%
Downloads: 12.7%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/alpa-projects/alpa
  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 9.0%
Average: 9.6%
Dependent repos count: 10.2%
Last synced: 6 months ago

Dependencies

alpa/collective/requirements.txt pypi
  • cupy-cuda111 *
examples/mnist/requirements.txt pypi
  • absl-py ==1.0.0
  • clu ==0.0.6
  • flax ==0.3.6
  • jax ==0.2.21
  • jaxlib ==0.1.70
  • ml-collections ==0.1.0
  • numpy ==1.21.4
  • optax ==0.1.0
  • tensorflow ==2.7.0
  • tensorflow-datasets ==4.4.0
.github/workflows/build_jaxlib.yml actions
  • actions/checkout v3 composite
  • styfle/cancel-workflow-action 0.9.1 composite
.github/workflows/ci.yml actions
  • actions/checkout v2 composite
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
  • styfle/cancel-workflow-action 0.9.1 composite
.github/workflows/docs.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
  • peaceiris/actions-gh-pages v3 composite
.github/workflows/release_alpa.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/release_jaxlib.yml actions
  • WyriHaximus/github-action-get-previous-tag v1 composite
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
examples/setup.py pypi
setup.py pypi