https://github.com/beomi/bitnet-transformers
0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.9%) to scientific vocabulary
Keywords
Repository
0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
Statistics
- Stars: 302
- Watchers: 9
- Forks: 32
- Open Issues: 8
- Releases: 0
Topics
Metadata Files
README.md
0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture


- Paper Link: https://arxiv.org/pdf/2310.11453.pdf
Prepare Dev env
```bash
Clone this repo
git clone https://github.com/beomi/bitnet-transformers cd bitnet-transformers
Install requirements
pip install -r clm_requirements.txt
Clone transformers repo
git clone https://github.com/huggingface/transformers pip install -e transformers
Update Llama(2) model
rm ./transformers/src/transformers/models/llama/modelingllama.py ln -s $(pwd)/bitnetllama/modelingllama.py ./transformers/src/transformers/models/llama/modelingllama.py ```
We'll overwrite bitnet_llama/modeling_llama.py into transformers. Since the file is linked, any changes made to the file will be reflected in the transformers repo.
Train Wikitext-103

You can track metrics via wandb
bash
./train_wikitext.sh
GPU Mem Usage Comparison
Train Config
- Batch size: 1
- Gradient accumulation: 1
- Seq length: 2048
- Model:
LLamaForCausalLMwithBitLinearlayer - Model size: 47,452,672 (47.5M)
Original LLAMA - 16bit
- Uses 250MB GPU memory for Model weights
BitLLAMA - Mixed 16bit
- Uses 200MB GPU memory for Model weights
- Use bf16(or fp16) to store model weights
- Use int8 to store
-1/11-bit weights - Use more memory when training than original LLAMA: It saves 1-bit weight and 16bit weight together
BitLLAMA - 8bit
- Uses 100MB GPU memory for Model weights
- Use bf16(or fp16) on-the-fly when needed
- Use 8bit to save 1-bit BitLinear weight & other weights
BitLLAMA - 1bit
- Use bf16(or fp16) on-the-fly when needed
- Use 1bit to save 1-bit weight
bash
TBD
Todo
- [x] Add
BitLinearlayer - [x] Add
LLamaForCausalLMmodel withBitLinearlayer- [x] Update
.save_pretrainedmethod (for 1-bit weight saving)
- [x] Update
- [x] Add sample code for LM training
- [ ] Update
BitLinearlayer to use 1-bit weight- [ ] Use uint8 instead of bfloat16
- [ ] Use custom cuda kernel for 1-bit weight
GitHub Events
Total
- Watch event: 45
- Issue comment event: 1
- Fork event: 4
Last Year
- Watch event: 45
- Issue comment event: 1
- Fork event: 4
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| junbum lee | j****n@b****t | 18 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 11
- Total pull requests: 2
- Average time to close issues: 3 minutes
- Average time to close pull requests: 1 minute
- Total issue authors: 9
- Total pull request authors: 2
- Average comments per issue: 0.82
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- DewEfresh (1)
- thwannbe (1)
- ttl10101 (1)
- nevakrien (1)
- chuxiliyixiaosa (1)
- klei22 (1)
- Ywandung-Lyou (1)
- darkman111a (1)
Pull Request Authors
- eltociear (2)
- gtpk (2)