rwkv-lite
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: wonkyoc
- License: apache-2.0
- Language: Python
- Default Branch: clean-public
- Size: 1000 KB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices
RWKV-Lite is a suite of compression techniques to reduce the system memory usage on runtime.
- Paper: arxiv
Based on this repo, our demo: RWKV on $30 hardware, under 3 Watt. (Demo code)
Training
Set up environments for training
```
Create a conda environment
conda create -n rwkv python=3.10
Install python requirements
pip install -r requirements.txt ```
SVD Training
Step 1. create a workspace directory
mkdir -p RWKV-v5/out/04b-x58
Step 2. copy over all training scripts
cd RWKV-v5
cp template/*.sh out/04b-x58
Files to change:
model-config.shto set model variant, # of layers, etc.run-train.shto change training hyperparams, e.g. learning rate, etcrun-eval.shto change evaluation hyperparams, etc
Step 3. prepare a dataset: See Dataset.
Step 4. initialize a model
cd {your-path}/RWKV-v5/out/04b-x58
bash prep.sh
Step 5. run a training script ```
check the current dir
pwd {your-path}/RWKV-Lite/RWKV-v5/out/04b-x58
run a script
bash run-train.sh ```
Sparsity training
Step 1. Collect FFN data. See Sparsity dataset.
Step 2. Change file paths ```bash pwd ~/RWKV-LM-/RWKV-v5
vim src/train-ffn.py
Change the following paths
inmodelfile=f"{RWKV_HOME}/out/04b-x58/04b-x58.pth"
outpath=f"{RWKV_HOME}/out/04b-x58/"
outmodelfile=f"{RWKV_HOME}/out/04b-x58/04b-x58-ffn.pth"
```
Step 3. Run a script
bash
python src/train-ffn.py
Hierarchical head training
Step 1. Cluster vocabulary: In svd.py, decompose_emb() function has $K$ variable, which determines the number of clusters. You may want to change it. The default is 200.
```bash
pwd
~/RWKV-LM/RWKV-v5
python src/svd.py --decompose 2 --orig_model out/04b-x58/04b-x58 [ 642 5161 ... 203 503 128 177 40]
Check your output file
ls out/04b-x58/ ... 04b-x58-cls.npy ... ```
Step 2. Edit a training script and change environment variables ```bash
out/04b-x58/model-config.sh
HEAD_K = 200 # The number is the same as we set $K$ in svd.py
out/04b-x58/run-train.sh
add the following flags at the last line of the script
--headK $HEADK \ --loadtokencls "$PROJDIR/rwkv-cls.npy" \ --loadpartial 1 ```
Step 3. Make symbolic links for the cluster and model ```bash pwd ~/RWKV-LM/RWKV-v5/out/04b-x58/
ln -s 04b-x58-cls.npy rwkv-cls.npy ln -s 04b-x58-mlp.pth rwkv-init.pth ```
Step 4. Run a script ```bash pwd ~/RWKV-LM/RWKV-v5/out/04b-x58/
run-train.sh
You will see such outputs
will train: headl1.weight will train: headl1fc1.weight will train: head_l1fc2.weight ...
| Name | Type | Params
0 | emb | Embedding | 67.1 M 1 | blocks | ModuleList | 233 M 2 | lnout | LayerNorm | 2.0 K 3 | head | Linear | 67.1 M 4 | headl1 | Linear | 204 K 5 | headl1fc1 | Linear | 1.0 M 6 | headl1fc2 | Linear | 204 K
7 | head_l2 | ParameterList | 67.1 M
1.5 M Trainable params # Notice that this is our trainable parameters for hiearchical head 434 M Non-trainable params 435 M Total params
```
Inference
Evaluation
Step 0. Create a symbolic link in src
pwd
~/RWKV-Lite
ln -s $(pwd)/rwkv RWKV-v5/src/
Step 1. Install lm_eval
cd {your-path}/RWKV-Lite
bash scripts/install-lm-eval.sh
Step 2. Run a script
cd out/04b-x58
bash run-eval.sh
Example: a simple ChatBot
```bash pwd ~/RWKV-LM/RWKV-v5
export RWKV_HOME=$(pwd) python src/test-rwkv-chat.py
Elon Musk has made a real case for the possibility of owning a Tesla, the company he founded in 2002 and co-founded with Elon Musk’s son, Elon Musk Sr. Tesla’s shares soared from $100 on April 4 to over $120 in the first three days of trading on Friday, a significant climb from its low point. ```
Inference (Raspberry Pi 5)
Turn on our feats: sparsity / hiearchical head / lazy embedding
```python
Ensemble sparsity FFN variables
quantbit = 1 quantmap = [0.95] * 24 mlp_map = [0.7] * 24
Hiearchical head path
hhon = True hhpath = f"{RWKV_HOME}/out/04b-x58/04b-x58-cls.npy"
Lazy embedding
emb_on = True
t0 = time.time() model = RWKV(model=modelpath, strategy=strategy, quantbit=quantbit, # Sparse FFN quantmap=quantmap, # Sparse FFN mlpmap=mlpmap, # Sparse FFN loadtokencls=hhpath, # Hiearchical head onclusterhead=hhon, # Hiearchical head lazyemb=emb_on, # Lazy embedding verbose=True) ```
Dataset
A toy example
Step 1. Get minipile dataset
bash
bash RWKV-v5/getdata.sh
ls RWKV-v5/data
minipile.bin minipile.idx
Sparsity dataset
Step 1. Set an environment variable ```bash pwd ~/RWKV-LM/RWKV-v5
export RWKV_HOME=$(pwd) ```
Step 2. Run a collecting script
Make sure you have the
rwkvsymbolic link insrc/```bash pwd ~/RWKV-LM/RWKV-v5
You need to set your input model and output path.
python src/collect-sparse-data.py ```
Step 3. Check your data ```bash
Your output path
ls
Model information
| paramter size | # of layers | Embedding dim | | ------------- | ----------- | ------------- | | 0.1B | 12 | 768 | | 0.4B | 24 | 1024 | | 1.5B | 24 | 2048 | | 3B | 32 | 2560 | | 7B | 32 | 4096 |
Troubleshooting
Issue: GPU
bash
zation -std=c++17 -c /data/home/bfr4xr/RWKV-LM/RWKV-v5/src/rwkv/cuda/operators.cu -o operators.cuda.o
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
ninja: build stopped: subcommand failed.
This is because of CUDA. Please do source env.sh for setting CUDA_HOME
TODO
Training
- [x] SVD training
- [x] Sparsity FFN training
- [x] Hierarchical head training
Inference
- [x] A toy example: inference e.g., chat
- [x] Hierarchical head example
- [x] Sparsity FFN example
- [x] Embedding example
- [ ] NEON instruction
- [ ] RPI inference
Evaluation
- [x] run
lm-evaluation-harnessexample
- [x] run
Data collection
- [x] A toy example: minipile
- [x] Sparsity data collection
- [ ] General dataset preparation e.g., pile
ETC
- [] Clutter unused codes or private comments
Owner
- Name: Wonkyo Choe
- Login: wonkyoc
- Kind: user
- Location: root
- Website: wonkyoc.github.io
- Repositories: 3
- Profile: https://github.com/wonkyoc
-
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "PENG" given-names: "Bo" orcid: "https://orcid.org/0000-0002-0865-547X" title: "RWKV-LM" version: 1.0.0 doi: 10.5281/zenodo.5196577 date-released: 2021-08-13 url: "https://github.com/BlinkDL/RWKV-LM"
GitHub Events
Total
- Issues event: 2
- Watch event: 3
- Issue comment event: 2
- Member event: 1
- Push event: 4
- Create event: 2
Last Year
- Issues event: 2
- Watch event: 3
- Issue comment event: 2
- Member event: 1
- Push event: 4
- Create event: 2