https://github.com/1587causalai/medicalgpt-training-pipeline

训练定制大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

https://github.com/1587causalai/medicalgpt-training-pipeline

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

训练定制大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Basic Info
  • Host: GitHub
  • Owner: 1587causalai
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 13 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of shibing624/MedicalGPT
Created almost 2 years ago · Last pushed almost 2 years ago

https://github.com/1587causalai/MedicalGPT-Training-Pipeline/blob/main/

# MedicalGPT: Training Medical GPT Model

6


## Zero2All 

-  [link](quick-start-with-5-hours.md)






##  Introduction

**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining,
Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preference Optimization).

**MedicalGPT** RLHF()DPO()



- RLHF training pipelineAndrej KarpathyPDF [State of GPT](https://karpathy.ai/stateofgpt.pdf) [Video](https://build.microsoft.com/en-US/sessions/db3f4859-cd30-4445-a0cd-553c3304f8e2)
- DPO[Direct Preference Optimization:Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290.pdf)
- ORPO[ORPO: Monolithic Preference Optimization without Reference Model](https://arxiv.org/abs/2403.07691)

##  News

- [2024/07/18] Disco Reward Modeling 
- [2024/07/03]  LLM  [MedicalGPT](https://github.com/shibing624/MedicalGPT)


##  Features


ChatGPT Training Pipeline--

- PT(Continue PreTraining)GPT
- SFT(Supervised Fine-tuning)
- 
  - RLHF(Reinforcement Learning from Human Feedback)
    - RM(Reward Model)"HHH""helpful, honest, harmless"
    - RL(Reinforcement Learning)SFT
  - [DPO(Direct Preference Optimization)](https://arxiv.org/pdf/2305.18290.pdf)DPODPORLHF
  - [ORPO](https://arxiv.org/abs/2403.07691)ORPOLLM


### Release Models


| Model                                                                                                             | Base Model                                                                              | Introduction                                                                                                                                                                 |
|:------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora)           | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1)       | 240[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)SFTZiya-LLaMA-13BLoRA()                                 |
| [shibing624/ziya-llama-13b-medical-merged](https://huggingface.co/shibing624/ziya-llama-13b-medical-merged)       | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1)       | 240[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)SFTZiya-LLaMA-13B()                                 |
| [shibing624/vicuna-baichuan-13b-chat-lora](https://huggingface.co/shibing624/vicuna-baichuan-13b-chat-lora)       | [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) | 10ShareGPT GPT4[shibing624/sharegpt_gpt4](https://huggingface.co/datasets/shibing624/sharegpt_gpt4)SFTbaichuan-13b-chatLoRA |
| [shibing624/vicuna-baichuan-13b-chat](https://huggingface.co/shibing624/vicuna-baichuan-13b-chat)                 | [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) | 10ShareGPT GPT4[shibing624/sharegpt_gpt4](https://huggingface.co/datasets/shibing624/sharegpt_gpt4)SFTbaichuan-13b-chat |
| [shibing624/llama-3-8b-instruct-262k-chinese](https://huggingface.co/shibing624/llama-3-8b-instruct-262k-chinese) | [Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k)  | 2[shibing624/DPO-En-Zh-20k-Preference](https://huggingface.co/datasets/shibing624/DPO-En-Zh-20k-Preference)ORPORAG                   |

[shibing624/vicuna-baichuan-13b-chat](https://huggingface.co/shibing624/vicuna-baichuan-13b-chat)

case[Inference Examples](#inference-examples)

##  Demo


gradioweb


```shell
CUDA_VISIBLE_DEVICES=0 python gradio_demo.py --model_type base_model_type --base_model path_to_llama_hf_dir --lora_model path_to_lora_dir
```



- `--model_type {base_model_type}`llamabloomchatglm
- `--base_model {base_model}`HFLLaMAHF Model Hub
- `--lora_model {lora_model}`LoRAHF Model Hublora--lora_model
- `--tokenizer_path {tokenizer_path}`tokenizer--base_model
- `--template_name``vicuna``alpaca`vicuna
- `--only_cpu`: CPU
- `--resize_emb`embeddingembedding

##  Training Pipeline

Training Stage:

| Stage                          | Introduction | Python script                                                                                           | Shell script                                                                  |
|:-------------------------------|:-------------|:--------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------|
| Continue Pretraining           |         | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py)                     | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh)     |
| Supervised Fine-tuning         |         | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh)   |
| Direct Preference Optimization |        | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py)                   | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh)   |
| Reward Modeling                |        | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py)             | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh)     |
| Reinforcement Learning         |          | [ppo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/ppo_training.py)                   | [run_ppo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_ppo.sh)   |
| ORPO                           |        | [orpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/orpo_training.py)                  | [run_orpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_orpo.sh) |

- PT+SFT+DPOpipeline[run_training_dpo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb) colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb)15
- PT+SFT+RLHFpipeline[run_training_ppo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) 20
- LLMRAG[chatpdf.py](https://github.com/shibing624/MedicalGPT/blob/main/chatpdf.py)
- [](https://github.com/shibing624/MedicalGPT/blob/main/docs/training_params.md) | [wiki](https://github.com/shibing624/MedicalGPT/wiki/%E8%AE%AD%E7%BB%83%E5%8F%82%E6%95%B0%E8%AF%B4%E6%98%8E)
- [](https://github.com/shibing624/MedicalGPT/blob/main/docs/datasets.md) | [wiki](https://github.com/shibing624/MedicalGPT/wiki/%E6%95%B0%E6%8D%AE%E9%9B%86)
- [](https://github.com/shibing624/MedicalGPT/blob/main/docs/extend_vocab.md) | [wiki](https://github.com/shibing624/MedicalGPT/wiki/%E6%89%A9%E5%85%85%E4%B8%AD%E6%96%87%E8%AF%8D%E8%A1%A8)
- [FAQ](https://github.com/shibing624/MedicalGPT/blob/main/docs/FAQ.md) | [FAQ_wiki](https://github.com/shibing624/MedicalGPT/wiki/FAQ)


##  Inference


```shell
CUDA_VISIBLE_DEVICES=0 python inference.py \
    --model_type base_model_type \
    --base_model path_to_model_hf_dir \
    --tokenizer_path path_to_model_hf_dir \
    --lora_model path_to_lora \
    --interactive
```



- `--model_type {base_model_type}`llamabloomchatglm
- `--base_model {base_model}`HFLLaMA
- `--tokenizer_path {base_model}`HFLLaMA
- `--lora_model {lora_model}`LoRAHF Model HubLoRA
- `--tokenizer_path {tokenizer_path}`tokenizer--base_model
- `--template_name``vicuna``alpaca`vicuna
- `--interactive`
- `--data_file {file_name}`file_namebatch
- `--output_file {file_name}`jsonlfile_name
- `--resize_emb`embeddingembedding
- `--only_cpu`CPU
- `--gpus {gpu_ids}`GPU0GPU0,1,2

#### 
batch
```shell
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 inference_multigpu_demo.py --model_type baichuan --base_model shibing624/vicuna-baichuan-13b-chat
```

Owner

  • Name: Heyang Gong
  • Login: 1587causalai
  • Kind: user

1587causalai

GitHub Events

Total
Last Year