relora

Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

https://github.com/guitaricet/relora

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (1.5%) to scientific vocabulary

Keywords

deep-learning distributed-training llama nlp peft transformer
Last synced: 6 months ago · JSON representation

Repository

Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

Basic Info
Statistics
  • Stars: 462
  • Watchers: 9
  • Forks: 40
  • Open Issues: 5
  • Releases: 0
Topics
deep-learning distributed-training llama nlp peft transformer
Created almost 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.dev.md

Some script to check that the most common training reigmes work.

``` torchrun --nproc-per-node 2 torchrunmain.py \ --datasetpath preprocesseddata/wikitextwikitext-2-v1EleutherAIpythia-1.4b512 \ --modelnameorpath EleutherAI/pythia-1.4b \ --usepeft \ --relora 10 \ --modelrevision step1000 \ --batchsize 4 \ --totalbatchsize 96 \ --lr 5e-4 \ --maxlength 512 \ --evalevery 20 \ --saveevery 20 \ --numtrainingsteps 40 \ --distributedtype ddp \ --optimizer adamzero \ --tags debug

torchrun --nproc-per-node 2 torchrunmain.py \ --datasetpath preprocesseddata/wikitextwikitext-2-v1EleutherAIpythia-1.4b512 \ --modelnameorpath EleutherAI/pythia-1.4b \ --modelrevision step1000 \ --batchsize 6 \ --totalbatchsize 96 \ --lr 5e-4 \ --maxlength 512 \ --evalevery 2 \ --saveevery 10 \ --numtrainingsteps 20 \ --distributedtype ddp \ --tags debug,fsdp_debug

torchrun --nproc-per-node 2 torchrunmain.py \ --datasetpath preprocesseddata/wikitextwikitext-2-v1t5-base512 \ --modelconfig configs/llama250m.json \ --batchsize 24 \ --totalbatchsize 96 \ --lr 5e-4 \ --maxlength 512 \ --evalevery 2 \ --saveevery 10 \ --numtrainingsteps 20 \ --distributedtype ddp \ --tags debug,fsdpdebug

torchrun --nproc-per-node 2 torchrunmain.py \ --datasetpath preprocesseddata/wikitextwikitext-2-v1t5-base512 \ --modelconfig configs/llama250m.json \ --batchsize 24 \ --totalbatchsize 96 \ --lr 5e-4 \ --maxlength 512 \ --evalevery 2 \ --saveevery 10 \ --numtrainingsteps 20 \ --distributedtype fsdp \ --tags debug,fsdpdebug

torchrun --nproc-per-node 2 torchrunmain.py \ --datasetpath preprocesseddata/wikitextwikitext-2-v1gpt2512 \ --modelconfig configs/llama250m50K.json \ --batchsize 24 \ --totalbatchsize 96 \ --lr 5e-4 \ --maxlength 512 \ --evalevery 2 \ --saveevery 10 \ --numtrainingsteps 20 \ --distributedtype ddp \ --dtype float32 \ --tags debug,fsdp_debug

torchrun --nproc-per-node 2 torchrunmain.py \ --modelconfig configs/llama250m.json \ --batchsize 24 \ --totalbatchsize 96 \ --lr 5e-4 \ --maxlength 512 \ --evalevery 2 \ --saveevery 10 \ --numtrainingsteps 20000 \ --distributedtype fsdp \ --tags debug,fsdp_debug

torchrun --nproc-per-node 2 torchrunmain.py \ --modelconfig configs/llama250m.json \ --batchsize 24 \ --totalbatchsize 96 \ --lr 5e-4 \ --maxlength 512 \ --evalevery 2 \ --saveevery 10 \ --numtrainingsteps 20000 \ --distributedtype fsdp \ --tags debug,fsdp_debug

torchrun --nproc-per-node 2 torchrunmain.py \ --modelconfig configs/llama250m.json \ --batchsize 24 \ --totalbatchsize 96 \ --lr 1e-3 \ --maxlength 512 \ --usepeft \ --relora 10 \ --cyclelength 10 \ --restartwarmupsteps 5 \ --scheduler cosinerestarts \ --warmupsteps 5 \ --resetoptimizeronrelora False \ --optimizermagnitudepruning 0.9 \ --numtrainingsteps 20000 \ --saveevery 5000 \ --evalevery 5000 \ --warmedupmodel checkpoints/llama250m-2023-06-09-11-29-56/model5000 \ --distributedtype fsdp \ --tags debug,fsdpdebug

torchrun --nproc-per-node 2 torchrunmain.py \ --modelconfig configs/llama250m.json \ --batchsize 24 \ --totalbatchsize 96 \ --lr 1e-3 \ --maxlength 512 \ --usepeft \ --relora 10 \ --cyclelength 10 \ --restartwarmupsteps 5 \ --scheduler cosinerestarts \ --warmupsteps 5 \ --resetoptimizeronrelora False \ --optimizermagnitudepruning 0.9 \ --numtrainingsteps 20000 \ --saveevery 5000 \ --evalevery 5000 \ --warmedupmodel checkpoints/llama250m-2023-06-09-11-29-56/model5000 \ --distributedtype fsdp \ --tags debug,fsdpdebug

```

Owner

  • Name: Vlad Lialin
  • Login: Guitaricet
  • Kind: user
  • Location: San Francisco, CA
  • Company: @1x-technologies

Deep Learning for Robotics @ 1X Technologies

GitHub Events

Total
  • Issues event: 1
  • Watch event: 29
  • Issue comment event: 6
  • Fork event: 4
Last Year
  • Issues event: 1
  • Watch event: 29
  • Issue comment event: 6
  • Fork event: 4

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 214
  • Total Committers: 1
  • Avg Commits per committer: 214.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Vladislav Liain g****t@g****m 214

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 17
  • Total pull requests: 1
  • Average time to close issues: 12 days
  • Average time to close pull requests: N/A
  • Total issue authors: 17
  • Total pull request authors: 1
  • Average comments per issue: 1.59
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 4.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ScottishFold007 (1)
  • omri123 (1)
  • JinYujie99 (1)
  • tiendung (1)
  • wanghao-007 (1)
  • thistleknot (1)
  • vorobyov01 (1)
  • ElleLeonne (1)
  • itongggg (1)
  • DaehanKim (1)
  • haofanwang (1)
  • henbucuoshanghai (1)
  • datalee (1)
  • mooncui (1)
  • skykiseki (1)
Pull Request Authors
  • Guitaricet (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • datasets *
  • lion-pytorch *
  • loguru *
  • matplotlib *
  • nvitop *
  • peft *
  • tokenizers *
  • torch *
  • transformers *
  • wandb *
setup.py pypi