https://github.com/bigscience-workshop/bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

https://github.com/bigscience-workshop/bigscience

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.4%) to scientific vocabulary

Keywords

machine-learning models nlp training

Keywords from Contributors

transformer vlm speech dataset-hub speech-recognition qwen pytorch-transformers pretrained-models model-hub glm
Last synced: 5 months ago · JSON representation

Repository

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

Basic Info
  • Host: GitHub
  • Owner: bigscience-workshop
  • License: other
  • Language: Shell
  • Default Branch: master
  • Homepage:
  • Size: 3.26 MB
Statistics
  • Stars: 996
  • Watchers: 37
  • Forks: 99
  • Open Issues: 20
  • Releases: 0
Topics
machine-learning models nlp training
Created almost 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Codeowners

README.md

bigscience

Research workshop on large language models - The Summer of Language Models 21

At the moment we have 2 code repos:

  1. https://github.com/bigscience-workshop/Megatron-DeepSpeed - this is our flagship code base
  2. https://github.com/bigscience-workshop/bigscience - (this repo) for everything else - docs, experiments, etc.

Currently, the most active segments of this repo are:

  • JZ - Lots of information about our work environment which helps evaluate, plan and get things done
  • Experiments - many experiments are being done. Documentation, result tables, scripts and logs are all there
  • Datasets info
  • Train - all the information about the current trainings (see below for the most important ones)

We have READMEs for specific aspects, such as: - hub integration

Trainings

While we keep detailed chronicles of experiments and findings for some of the main trainings, here is a doc that contains a summary of the most important findings: Lessons learned

Train 1 - 13B - unmodified Megatron gpt2 - baseline

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour: ``` perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \ print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \ https://huggingface.co/bigscience/tr1-13B-logs/resolve/main/main_log.txt

```

Train 3

Architecture and scaling baseline runs: no fancy tricks, just GPT2. Here are links to the respective tensorboards:

| Size | 1B3 | 760M | 350M | 125M | |--------------------- |----- |------ |------ |------ | | C4 + low warmup | a | b | c | | | OSCAR + low warmup | f | | | | | C4 + high warmup | e | | | | | OSCAR + high warmup | d (current baseline) | g | h | i | | Pile + high warmup | m | j | k | l |

Train 8

104B - unmodified Megatron gpt2 - with extra-wide hidden size to learn how to deal with training instabilities

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour: perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -sI $u]=~/content-length: (\d+)/; \ print qx[curl -sr $b-$e -L $u] if $e>$b; $b=$e; sleep 300}' \ https://cdn-lfs.huggingface.co/bigscience/tr8-104B-logs/b2cc478d5ae7c9ec937ea2db1d2fe09de593fa2ec38c171d6cc5dca094cd79f9

Train 11

This is the current main training

tr11-176B-ml

You can watch the training logs live by running this tail -f like script over remote log file that gets synced to the hub once an hour: perl -e '$u=shift; $b=0; while(1){($e)=qx[curl -LsI $u]=~/2 200.*?content-length: (\d+)/s; \ print qx[curl -Lsr $b-$e $u] if $e>$b; $b=$e; sleep 300}' \ https://huggingface.co/bigscience/tr11-176B-ml-logs/resolve/main/logs/main/main_log.txt

Owner

  • Name: BigScience Workshop
  • Login: bigscience-workshop
  • Kind: organization
  • Email: bigscience-contact@googlegroups.com

Research workshop on large language models - The Summer of Language Models 21

GitHub Events

Total
  • Issues event: 1
  • Watch event: 23
  • Fork event: 2
Last Year
  • Issues event: 1
  • Watch event: 23
  • Fork event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 1,100
  • Total Committers: 15
  • Avg Commits per committer: 73.333
  • Development Distribution Score (DDS): 0.382
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Stas Bekman s****s@s****g 680
Muennighoff n****f@g****m 171
TevenLeScao t****o@g****m 110
thomasw21 2****1 74
SaulLu l****m@g****m 27
ontocord o****d@g****m 14
Iz Beltagy b****y@a****g 9
Victor SANH v****h@g****m 4
HugoLaurencon h****n@g****m 3
Max Ryabinin m****0@g****m 2
Younes Belkada 4****a 2
Christopher Akiki c****i@p****m 1
Nouamane Tazi n****8@g****m 1
Pierre Colombo P****o 1
Thomas Wolf t****f 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 19
  • Total pull requests: 53
  • Average time to close issues: 5 months
  • Average time to close pull requests: 19 days
  • Total issue authors: 18
  • Total pull request authors: 12
  • Average comments per issue: 0.79
  • Average comments per pull request: 0.94
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • robertLiuLinFeng (2)
  • RomanCast (1)
  • OhadRubin (1)
  • misska1 (1)
  • robinfang7 (1)
  • BlinkDL (1)
  • henan991201 (1)
  • cmsflash (1)
  • sashavor (1)
  • ViktorThink (1)
  • yu202147657 (1)
  • zhangyipin (1)
  • stas00 (1)
  • kamalkraj (1)
  • celsofranssa (1)
Pull Request Authors
  • thomasw21 (21)
  • SaulLu (11)
  • Muennighoff (5)
  • younesbelkada (3)
  • TevenLeScao (3)
  • cakiki (2)
  • ibeltagy (2)
  • adammoody (1)
  • NouamaneTazi (1)
  • EIFY (1)
  • eltociear (1)
  • thomwolf (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements_dev.txt pypi
  • Sphinx ==1.8.5
  • bump2version ==0.5.11
  • coverage ==4.5.4
  • flake8 ==3.7.8
  • pip ==19.2.3
  • pytest ==4.6.5
  • pytest-runner ==5.1
  • tox ==3.14.0
  • twine ==1.14.0
  • watchdog ==0.9.0
  • wheel ==0.33.6
setup.py pypi