gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

https://github.com/eleutherai/gpt-neox

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    8 of 130 committers (6.2%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.8%) to scientific vocabulary

Keywords

deepspeed-library gpt-3 language-model transformers

Keywords from Contributors

transformer evaluation-framework cryptocurrency jax cryptography agents multi-agent distributed optim deep-neural-networks
Last synced: 6 months ago · JSON representation ·

Repository

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Basic Info
  • Host: GitHub
  • Owner: EleutherAI
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://www.eleuther.ai/
  • Size: 114 MB
Statistics
  • Stars: 7,266
  • Watchers: 127
  • Forks: 1,072
  • Open Issues: 85
  • Releases: 3
Topics
deepspeed-library gpt-3 language-model transformers
Created about 5 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Citation Codeowners

README-MUP.md

How to use Mup (https://github.com/microsoft/mup)

Add mup neox args to your config

```

mup

"use-mup": true,

"save-base-shapes": false, # this only needs to be enabled once in order to generate the base-shapes-file on each rank

"base-shapes-file": "base-shapes", # load base shapes from this file

"coord-check": false, # generate coord check plots to verify mup's implementation in neox

mup hp search

"mup-init-scale": 1.0,

"mup-attn-temp": 1.0,

"mup-output-temp": 1.0,

"mup-embedding-mult": 1.0,

"mup-rp-embedding-mult": 1.0, ```

Generate base shapes

  1. Set use-mup to true
  2. Set save-base-shapes to true
  3. Run once. gpt-neox will instantiate a base model and a delta model, then save one file per rank named .. gpt-neox will exit immediately.
  4. Set save-base-shapes to false

Generate coord check plots (optional)

  1. Keep use-mup true
  2. Set coord-check to true
  3. Run once. gpt-neox will output jpg images similar to https://github.com/microsoft/mutransformers/blob/main/README.md#coord-check. gpt-neox will exit immediately
  4. Set coord-check to false

Tune mup hyperparameters and LR

The values under mup hp search were added and correspond to appendix F.4 from https://arxiv.org/pdf/2203.03466.pdf. These and LR are tuned with a random search using the scaled-up config (tested with 6-7B.yml) but with hidden-size set to the value from the scaled-down config (125M.yml).

Transfer

With the best LR set and the best mup HPs set, revert the value of hidden-size in the scaled-up config and run again.

Owner

  • Name: EleutherAI
  • Login: EleutherAI
  • Kind: organization
  • Email: contact@eleuther.ai
  • Location: The Internet

Citation (CITATION.cff)

# YAML 1.2
---
authors:
  - affiliation: EleutherAI
    family-names: Andonian
    given-names: Alex
  - affiliation: EleutherAI
    family-names: Anthony
    given-names: Quentin
  - affiliation: EleutherAI
    family-names: Biderman
    given-names: Stella
  - affiliation: EleutherAI
    family-names: Black
    given-names: Sid
  - affiliation: EleutherAI
    family-names: Gali
    given-names: Preetham
  - affiliation: EleutherAI
    family-names: Gao
    given-names: Leo
  - affiliation: EleutherAI
    family-names: Hallahan
    given-names: Eric
  - affiliation: EleutherAI
    family-names: Levy-Kramer
    given-names: Josh
  - affiliation: EleutherAI
    family-names: Leahy
    given-names: Connor
  - affiliation: EleutherAI
    family-names: Nestler
    given-names: Lucas
  - affiliation: EleutherAI
    family-names: Parker
    given-names: Kip
  - affiliation: EleutherAI
    family-names: Pieler
    given-names: Michael
  - affiliation: EleutherAI
    family-names: Phang
    given-names: Jason
  - affiliation: EleutherAI
    family-names: Purohit
    given-names: Shivanshu
  - affiliation: EleutherAI
    family-names: Schoelkopf
    given-names: Hailey
  - affiliation: EleutherAI
    family-names: Stander
    given-names: Dashiell
  - affiliation: EleutherAI
    family-names: Songz
    given-names: Tri
  - affiliation: EleutherAI
    family-names: Tigges
    given-names: Curt
  - affiliation: EleutherAI
    family-names: Thérien
    given-names: Benjamin
  - affiliation: EleutherAI
    family-names: Wang
    given-names: Phil
  - affiliation: EleutherAI
    family-names: Weinbach
    given-names: Samuel
cff-version: "1.1.0"
keywords:
  - "Transformers"
  - "Massive language model"
  - "Autoregressive language model"
license: "Apache-2.0"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://www.github.com/eleutherai/gpt-neox"
title: "GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch"
version: "2.0.0"
doi: "10.5281/zenodo.5879544"
date-released: 2021-08-23
...

GitHub Events

Total
  • Create event: 27
  • Commit comment event: 1
  • Issues event: 45
  • Watch event: 407
  • Delete event: 15
  • Issue comment event: 75
  • Push event: 86
  • Pull request review comment event: 25
  • Pull request review event: 46
  • Pull request event: 62
  • Fork event: 102
Last Year
  • Create event: 27
  • Commit comment event: 1
  • Issues event: 45
  • Watch event: 407
  • Delete event: 15
  • Issue comment event: 75
  • Push event: 86
  • Pull request review comment event: 25
  • Pull request review event: 46
  • Pull request event: 62
  • Fork event: 102

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 1,885
  • Total Committers: 130
  • Avg Commits per committer: 14.5
  • Development Distribution Score (DDS): 0.79
Past Year
  • Commits: 73
  • Committers: 23
  • Avg Commits per committer: 3.174
  • Development Distribution Score (DDS): 0.795
Top Committers
Name Email Commits
Stella Biderman s****n@g****m 395
sdtblck 4****k 312
Josh Levy-Kramer j****h@l****k 222
Samuel Weinbach s****h@g****m 206
sid s****k@a****e 120
github-actions g****s@g****m 73
Quentin Anthony q****y@y****m 52
Hailey Schoelkopf 6****f 45
trisongz t****i@s****m 32
Dashiell Stander d****r@p****m 32
Leo Gao 5****2 29
Shivanshu Purohit 4****t 27
dmahan93 4****3 15
Jacob Hatef 7****f 15
Phil Wang l****s@g****m 14
jack j****r@a****e 14
Xu Song x****p@g****m 11
yang 7****g 10
Aurelion 3****e 9
Kyle1668 k****1@g****m 9
Samuel Weinbach s****h@g****m 9
Eric Hallahan e****c@h****e 9
AI-WAIFU 6****U 9
haileyschoelkopf h****f@y****u 9
Michael Pieler M****r@G****m 8
curt-tigges ct@c****m 8
connor c****5@g****m 7
Jason Phang j****g@n****u 7
jaimemcc 9****l 7
Satpal Singh Rathore s****e@g****m 7
and 100 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 79
  • Total pull requests: 208
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 38
  • Total pull request authors: 52
  • Average comments per issue: 2.14
  • Average comments per pull request: 0.91
  • Merged pull requests: 146
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 26
  • Pull requests: 82
  • Average time to close issues: 22 days
  • Average time to close pull requests: 14 days
  • Issue authors: 17
  • Pull request authors: 20
  • Average comments per issue: 0.96
  • Average comments per pull request: 0.67
  • Merged pull requests: 54
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • StellaAthena (18)
  • Quentin-Anthony (14)
  • sdtblck (8)
  • mackmake (4)
  • fxnie (4)
  • exnx (4)
  • iPRET (3)
  • jahatef (3)
  • anthony-dipofi (2)
  • tf-nv (2)
  • Kyle1668 (2)
  • lieh1203 (2)
  • tijmen (2)
  • srivassid (2)
  • Carolingliang (2)
Pull Request Authors
  • Quentin-Anthony (42)
  • dmahan93 (34)
  • jahatef (31)
  • StellaAthena (26)
  • sdtblck (17)
  • AI-WAIFU (17)
  • aurelion-source (15)
  • haileyschoelkopf (14)
  • yang (13)
  • lucidrains (11)
  • R0n12 (11)
  • segyges (11)
  • bclyang (10)
  • jaimemcc-intel (8)
  • DayOfThePenguin (8)
Top Labels
Issue Labels
bug (49) feature request (46) good first issue (8) help wanted (4) documentation (2)
Pull Request Labels
dependencies (5) merge-queue (5)

Dependencies

requirements/requirements-dev.txt pypi
  • autopep8 ==1.5.6 development
  • clang-format ==13.0.1 development
  • pre-commit * development
  • pytest ==6.2.3 development
  • pytest-cov ==2.11.1 development
  • pytest-forked ==1.3.0 development
  • pytest-xdist * development
  • transformers * development
requirements/requirements-onebitadam.txt pypi
  • cupy-cuda111 ==8.6.0
requirements/requirements-sparseattention.txt pypi
  • triton ==0.4.2
requirements/requirements-tensorboard.txt pypi
  • tensorboard ==2.5.0
requirements/requirements.txt pypi
  • deepspeed eb7f5cff36678625d23db8a8fe78b4a93e5d2c75
  • einops ==0.3.0
  • ftfy ==6.0.1
  • lm_dataformat ==0.0.20
  • lm_eval ==0.2.0
  • mpi4py ==3.0.3
  • numpy ==1.22.0
  • pybind11 ==2.6.2
  • regex *
  • sentencepiece *
  • six *
  • tokenizers ==0.10.2
  • transformers *
  • wandb ==0.10.28
.github/workflows/cpu_ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/docker_build.yml actions
  • actions/checkout v2 composite
  • crazy-max/ghaction-docker-meta v1 composite
  • docker/build-push-action v2 composite
  • docker/login-action v1 composite
  • docker/setup-buildx-action v1 composite
  • docker/setup-qemu-action v1 composite
.github/workflows/pull_request.yml actions
  • actions/checkout v2 composite
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
  • pre-commit/action v2.0.3 composite
Dockerfile docker
  • nvidia/cuda 11.1.1-devel-ubuntu20.04 build
requirements/requirements-flashattention.txt pypi
  • flash-attn ==0.2.2
megatron/fused_kernels/setup.py pypi
requirements/requirements-s3.txt pypi
  • boto3 *
  • hf-transfer >=0.1.3
requirements/requirements-wandb.txt pypi
  • wandb >=0.10.28
.github/workflows/coverity_scan.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v3 composite