https://github.com/artificialzeng/chatglm-finetuning

基于ChatGLM-6B模型,进行下游具体任务微调,涉及Freeze、Lora、P-tuning等

https://github.com/artificialzeng/chatglm-finetuning

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (2.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

基于ChatGLM-6B模型,进行下游具体任务微调,涉及Freeze、Lora、P-tuning等

Basic Info
  • Host: GitHub
  • Owner: ArtificialZeng
  • Default Branch: master
  • Homepage:
  • Size: 1.2 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of liucongg/ChatGLM-Finetuning
Created almost 3 years ago · Last pushed about 3 years ago

https://github.com/ArtificialZeng/ChatGLM-Finetuning/blob/master/

## ChatGLM
ChatGLM

ChatGLM820-821

****

[](https://pan.baidu.com/s/1-UrZWnqw6Ciyo5K2NLraDg)jh0l

- update-2023.06.12 [****](https://zhuanlan.zhihu.com/p/636488690)
- update-2023.04.18 ****
- update-2023.04.05 ****

## 
### Freeze
FreezeTPPP

finetuning_freeze.py
```python3
for name, param in model.named_parameters():
    if not any(nd in name for nd in ["layers.27", "layers.26", "layers.25", "layers.24", "layers.23"]):
        param.requires_grad = False
```

DeepSpeedtrain_pathmodel_dirnum_train_epochstrain_batch_sizegradient_accumulation_stepsoutput_dirprompt_text

```
CUDA_VISIBLE_DEVICES=0 deepspeed finetuning_freeze.py --num_train_epochs 5 --train_batch_size 2
```
predict_freeze.py

### PT
PTP-Tuning[ChatGLM](https://github.com/THUDM/ChatGLM-6B/blob/main/ptuning/README.md) soft-prompt

![](images/PT.png)
- P-TuningEmbedding[paper](https://arxiv.org/abs/2103.10385)
- P-Tuning-V2Embedding[paper](https://arxiv.org/abs/2110.07602)
finetuning_pt.py
```python3
config = ChatGLMConfig.from_pretrained(args.model_dir)
config.pre_seq_len = args.pre_seq_len
config.prefix_projection = args.prefix_projection

model = ChatGLMForConditionalGeneration.from_pretrained(args.model_dir, config=config)

for name, param in model.named_parameters():
    if not any(nd in name for nd in ["prefix_encoder"]):
        param.requires_grad = False
```
prefix_projectionTrueP-Tuning-V2EmbeddingFalseP-TuningEmbedding

DeepSpeedtrain_pathmodel_dirnum_train_epochstrain_batch_sizegradient_accumulation_stepsoutput_dirprompt_textpre_seq_lenprompt_text

```
CUDA_VISIBLE_DEVICES=0 deepspeed finetuning_pt.py --num_train_epochs 5 --train_batch_size 2 --pre_seq_len 16
```
predict_pt.py

### Lora
Lora
tuning

![](images/Lora.png)
- [paper](https://arxiv.org/abs/2106.09685)
- [Github](https://github.com/microsoft/LoRA)
- HuggingFacepeft[Github](https://github.com/huggingface/peft)

finetuning_lora.py
```python3
model = ChatGLMForConditionalGeneration.from_pretrained(args.model_dir)
config = LoraConfig(r=args.lora_r,
                    lora_alpha=32,
                    target_modules=["query_key_value"],
                    lora_dropout=0.1,
                    bias="none",
                    task_type="CAUSAL_LM",
                    inference_mode=False,
                    )

model = get_peft_model(model, config)
```
DeepSpeedtrain_pathmodel_dirnum_train_epochstrain_batch_sizegradient_accumulation_stepsoutput_dirprompt_textlora_r

```
CUDA_VISIBLE_DEVICES=0 deepspeed finetuning_lora.py --num_train_epochs 5 --train_batch_size 2 --lora_r 8
```
predict_lora.py

adapter_config.jsoninference_modefalsemodel.eval()
chatglmConv1D

### 
requirements.txt

## 
### 
- -[](https://www.datafountain.cn/competitions/584)50
- 768Batch25fp16DeepSpeedZero-1
- PTP-Tuning V2PT-Only-EmbeddingEmbeddingsoft-promptFreezeLora8
- PT48G-A40OOMPTgradient_checkpointing_enable()
- 
```
prompt_text\"\", \"\", \"\" \"\"\"_\"\\n

__\n__
```


|  |  PT-Only-Embedding |  PT | Freeze |  Lora | 
| ------- | ------ | ------  | ------ | ------ |
|  | 37G | 30G | 24G | 39G |
|  | 6.259B | 7.211B | 6.255B | 6.259B |
|  | 0.0586% | 13.26% | 16.10% | 0.0586% |
|  | 53min | 135min | 112min | 65min |
| F1 | 0.0 | 0.6283 | 0.5675 | 0.5359 |
|  | 191s | 198s | 180s | 278s |


- PT>Freeze>Lora>PT-Only-Embedding
- PT-Only-Embeddingloss2.0.Embedding
- PT
- Freeze
- 
- -
- instruction

freezetest_forgetting.py
![](images/ft_fanyi.png)
![](images/ft_code.png)
![](images/ft_qa.png)
### - -[](https://tianchi.aliyun.com/competition/entrance/531826/introduction)20 - PTP-Tuning V2PT-Only-EmbeddingEmbeddingsoft-promptFreezeLora8 - ``` prompt_text 5-820 ``` freeze ``` CUDA_VISIBLE_DEVICES=0 nohup deepspeed --master_port 5555 finetuning_freeze.py --train_path "data/d2q_0.json" --output_dir "output_dir_freeze/" --prompt_text "" > log_fz.log 2>&1 & ``` BLUERouge D2Q520 - - 0.25 - 0.25 - - 0.25 - 0.25 - 0.25 d2q_result_data/predict_d2q.py | | | PT-Only-Embedding | PT | Freeze | Lora | | ------- | ------ | ------ | ------ | ------ | ------ | | | 51.75 | 73.75 | 87.75 | 79.25 | 86.75 | ### ## [Pipeline](https://zhuanlan.zhihu.com/p/636488690) Githubtrain_pipeline.py ``` CUDA_VISIBLE_DEVICES=0,1,2,3 deepspeed --master_port 5524 train_pipeline.py --train_path data/spo_0.json --model_name_or_path ./ChatGLM-6B/ --per_device_train_batch_size 14 --max_len 1024 --max_src_len 512 --num_train_epochs 5 --gradient_accumulation_steps 1 --seed 1234 --show_loss_step 20 --num_stages 4 --save_model_step 100 --output_dir ./output-glm-pp ``` Githubconvert_model_to_hf.py ``` python3 convert_model_to_hf.py --ori_model_dir ./ChatGLM-6B/ --pipeline_model_dir output-glm-pp/global_step300/ --save_model_dir output-glm-pp/gs300/ ``` | | 100 | 200 | 300 | 400 | 500 | | ------- | ------ | ------ | ------ | ------ | ------ | | F1 | 0.4931 | 0.5132 | 0.5882 | 0.5793 | 0.5874 | PTFreezeLora

Owner

  • Name: Dr. Artificial曾小健
  • Login: ArtificialZeng
  • Kind: user
  • Location: Beijing

LLM practitioner/engineer, AI/ML/DL Quant

GitHub Events

Total
Last Year