https://github.com/1587causalai/medicalgpt-training-pipeline
训练定制大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
https://github.com/1587causalai/medicalgpt-training-pipeline
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.7%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
训练定制大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of shibing624/MedicalGPT
Created almost 2 years ago
· Last pushed almost 2 years ago
https://github.com/1587causalai/MedicalGPT-Training-Pipeline/blob/main/
# MedicalGPT: Training Medical GPT Model 6 ## Zero2All - [link](quick-start-with-5-hours.md) ## Introduction **MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preference Optimization). **MedicalGPT** RLHF()DPO()- RLHF training pipelineAndrej KarpathyPDF [State of GPT](https://karpathy.ai/stateofgpt.pdf) [Video](https://build.microsoft.com/en-US/sessions/db3f4859-cd30-4445-a0cd-553c3304f8e2) - DPO[Direct Preference Optimization:Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290.pdf) - ORPO[ORPO: Monolithic Preference Optimization without Reference Model](https://arxiv.org/abs/2403.07691) ## News - [2024/07/18] Disco Reward Modeling - [2024/07/03] LLM [MedicalGPT](https://github.com/shibing624/MedicalGPT) ## Features ChatGPT Training Pipeline-- - PT(Continue PreTraining)GPT - SFT(Supervised Fine-tuning) - - RLHF(Reinforcement Learning from Human Feedback) - RM(Reward Model)"HHH""helpful, honest, harmless" - RL(Reinforcement Learning)SFT - [DPO(Direct Preference Optimization)](https://arxiv.org/pdf/2305.18290.pdf)DPODPORLHF - [ORPO](https://arxiv.org/abs/2403.07691)ORPOLLM ### Release Models | Model | Base Model | Introduction | |:------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 240[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)SFTZiya-LLaMA-13BLoRA() | | [shibing624/ziya-llama-13b-medical-merged](https://huggingface.co/shibing624/ziya-llama-13b-medical-merged) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 240[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)SFTZiya-LLaMA-13B() | | [shibing624/vicuna-baichuan-13b-chat-lora](https://huggingface.co/shibing624/vicuna-baichuan-13b-chat-lora) | [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) | 10ShareGPT GPT4[shibing624/sharegpt_gpt4](https://huggingface.co/datasets/shibing624/sharegpt_gpt4)SFTbaichuan-13b-chatLoRA | | [shibing624/vicuna-baichuan-13b-chat](https://huggingface.co/shibing624/vicuna-baichuan-13b-chat) | [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) | 10ShareGPT GPT4[shibing624/sharegpt_gpt4](https://huggingface.co/datasets/shibing624/sharegpt_gpt4)SFTbaichuan-13b-chat | | [shibing624/llama-3-8b-instruct-262k-chinese](https://huggingface.co/shibing624/llama-3-8b-instruct-262k-chinese) | [Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k) | 2[shibing624/DPO-En-Zh-20k-Preference](https://huggingface.co/datasets/shibing624/DPO-En-Zh-20k-Preference)ORPORAG | [shibing624/vicuna-baichuan-13b-chat](https://huggingface.co/shibing624/vicuna-baichuan-13b-chat)
case[Inference Examples](#inference-examples) ## Demo gradioweb ```shell CUDA_VISIBLE_DEVICES=0 python gradio_demo.py --model_type base_model_type --base_model path_to_llama_hf_dir --lora_model path_to_lora_dir ``` - `--model_type {base_model_type}`llamabloomchatglm - `--base_model {base_model}`HFLLaMAHF Model Hub - `--lora_model {lora_model}`LoRAHF Model Hublora--lora_model - `--tokenizer_path {tokenizer_path}`tokenizer--base_model - `--template_name``vicuna``alpaca`vicuna - `--only_cpu`: CPU - `--resize_emb`embeddingembedding ## Training Pipeline Training Stage: | Stage | Introduction | Python script | Shell script | |:-------------------------------|:-------------|:--------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------| | Continue Pretraining | | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) | | Supervised Fine-tuning | | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) | | Direct Preference Optimization | | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) | | Reward Modeling | | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) | | Reinforcement Learning | | [ppo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/ppo_training.py) | [run_ppo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_ppo.sh) | | ORPO | | [orpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/orpo_training.py) | [run_orpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_orpo.sh) | - PT+SFT+DPOpipeline[run_training_dpo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb) colab [](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb)15 - PT+SFT+RLHFpipeline[run_training_ppo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) colab [](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) 20 - LLMRAG[chatpdf.py](https://github.com/shibing624/MedicalGPT/blob/main/chatpdf.py) - [](https://github.com/shibing624/MedicalGPT/blob/main/docs/training_params.md) | [wiki](https://github.com/shibing624/MedicalGPT/wiki/%E8%AE%AD%E7%BB%83%E5%8F%82%E6%95%B0%E8%AF%B4%E6%98%8E) - [](https://github.com/shibing624/MedicalGPT/blob/main/docs/datasets.md) | [wiki](https://github.com/shibing624/MedicalGPT/wiki/%E6%95%B0%E6%8D%AE%E9%9B%86) - [](https://github.com/shibing624/MedicalGPT/blob/main/docs/extend_vocab.md) | [wiki](https://github.com/shibing624/MedicalGPT/wiki/%E6%89%A9%E5%85%85%E4%B8%AD%E6%96%87%E8%AF%8D%E8%A1%A8) - [FAQ](https://github.com/shibing624/MedicalGPT/blob/main/docs/FAQ.md) | [FAQ_wiki](https://github.com/shibing624/MedicalGPT/wiki/FAQ) ## Inference ```shell CUDA_VISIBLE_DEVICES=0 python inference.py \ --model_type base_model_type \ --base_model path_to_model_hf_dir \ --tokenizer_path path_to_model_hf_dir \ --lora_model path_to_lora \ --interactive ``` - `--model_type {base_model_type}`llamabloomchatglm - `--base_model {base_model}`HFLLaMA - `--tokenizer_path {base_model}`HFLLaMA - `--lora_model {lora_model}`LoRAHF Model HubLoRA - `--tokenizer_path {tokenizer_path}`tokenizer--base_model - `--template_name``vicuna``alpaca`vicuna - `--interactive` - `--data_file {file_name}`file_namebatch - `--output_file {file_name}`jsonlfile_name - `--resize_emb`embeddingembedding - `--only_cpu`CPU - `--gpus {gpu_ids}`GPU0GPU0,1,2 #### batch ```shell CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 inference_multigpu_demo.py --model_type baichuan --base_model shibing624/vicuna-baichuan-13b-chat ```
Owner
- Name: Heyang Gong
- Login: 1587causalai
- Kind: user
- Repositories: 1
- Profile: https://github.com/1587causalai
1587causalai
- RLHF training pipelineAndrej KarpathyPDF [State of GPT](https://karpathy.ai/stateofgpt.pdf) [Video](https://build.microsoft.com/en-US/sessions/db3f4859-cd30-4445-a0cd-553c3304f8e2)
- DPO[Direct Preference Optimization:Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290.pdf)
- ORPO[ORPO: Monolithic Preference Optimization without Reference Model](https://arxiv.org/abs/2403.07691)
## News
- [2024/07/18] Disco Reward Modeling
- [2024/07/03] LLM [MedicalGPT](https://github.com/shibing624/MedicalGPT)
## Features
ChatGPT Training Pipeline--
- PT(Continue PreTraining)GPT
- SFT(Supervised Fine-tuning)
-
- RLHF(Reinforcement Learning from Human Feedback)
- RM(Reward Model)"HHH""helpful, honest, harmless"
- RL(Reinforcement Learning)SFT
- [DPO(Direct Preference Optimization)](https://arxiv.org/pdf/2305.18290.pdf)DPODPORLHF
- [ORPO](https://arxiv.org/abs/2403.07691)ORPOLLM
### Release Models
| Model | Base Model | Introduction |
|:------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 240[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)SFTZiya-LLaMA-13BLoRA() |
| [shibing624/ziya-llama-13b-medical-merged](https://huggingface.co/shibing624/ziya-llama-13b-medical-merged) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 240[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)SFTZiya-LLaMA-13B() |
| [shibing624/vicuna-baichuan-13b-chat-lora](https://huggingface.co/shibing624/vicuna-baichuan-13b-chat-lora) | [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) | 10ShareGPT GPT4[shibing624/sharegpt_gpt4](https://huggingface.co/datasets/shibing624/sharegpt_gpt4)SFTbaichuan-13b-chatLoRA |
| [shibing624/vicuna-baichuan-13b-chat](https://huggingface.co/shibing624/vicuna-baichuan-13b-chat) | [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) | 10ShareGPT GPT4[shibing624/sharegpt_gpt4](https://huggingface.co/datasets/shibing624/sharegpt_gpt4)SFTbaichuan-13b-chat |
| [shibing624/llama-3-8b-instruct-262k-chinese](https://huggingface.co/shibing624/llama-3-8b-instruct-262k-chinese) | [Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k) | 2[shibing624/DPO-En-Zh-20k-Preference](https://huggingface.co/datasets/shibing624/DPO-En-Zh-20k-Preference)ORPORAG |
[shibing624/vicuna-baichuan-13b-chat](https://huggingface.co/shibing624/vicuna-baichuan-13b-chat)
case[Inference Examples](#inference-examples)
## Demo
gradioweb
```shell
CUDA_VISIBLE_DEVICES=0 python gradio_demo.py --model_type base_model_type --base_model path_to_llama_hf_dir --lora_model path_to_lora_dir
```
- `--model_type {base_model_type}`llamabloomchatglm
- `--base_model {base_model}`HFLLaMAHF Model Hub
- `--lora_model {lora_model}`LoRAHF Model Hublora--lora_model
- `--tokenizer_path {tokenizer_path}`tokenizer--base_model
- `--template_name``vicuna``alpaca`vicuna
- `--only_cpu`: CPU
- `--resize_emb`embeddingembedding
## Training Pipeline
Training Stage:
| Stage | Introduction | Python script | Shell script |
|:-------------------------------|:-------------|:--------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------|
| Continue Pretraining | | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) |
| Supervised Fine-tuning | | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |
| Direct Preference Optimization | | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) |
| Reward Modeling | | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) |
| Reinforcement Learning | | [ppo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/ppo_training.py) | [run_ppo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_ppo.sh) |
| ORPO | | [orpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/orpo_training.py) | [run_orpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_orpo.sh) |
- PT+SFT+DPOpipeline[run_training_dpo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb) colab [](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb)15
- PT+SFT+RLHFpipeline[run_training_ppo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) colab [](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) 20
- LLMRAG[chatpdf.py](https://github.com/shibing624/MedicalGPT/blob/main/chatpdf.py)
- [](https://github.com/shibing624/MedicalGPT/blob/main/docs/training_params.md) | [wiki](https://github.com/shibing624/MedicalGPT/wiki/%E8%AE%AD%E7%BB%83%E5%8F%82%E6%95%B0%E8%AF%B4%E6%98%8E)
- [](https://github.com/shibing624/MedicalGPT/blob/main/docs/datasets.md) | [wiki](https://github.com/shibing624/MedicalGPT/wiki/%E6%95%B0%E6%8D%AE%E9%9B%86)
- [](https://github.com/shibing624/MedicalGPT/blob/main/docs/extend_vocab.md) | [wiki](https://github.com/shibing624/MedicalGPT/wiki/%E6%89%A9%E5%85%85%E4%B8%AD%E6%96%87%E8%AF%8D%E8%A1%A8)
- [FAQ](https://github.com/shibing624/MedicalGPT/blob/main/docs/FAQ.md) | [FAQ_wiki](https://github.com/shibing624/MedicalGPT/wiki/FAQ)
## Inference
```shell
CUDA_VISIBLE_DEVICES=0 python inference.py \
--model_type base_model_type \
--base_model path_to_model_hf_dir \
--tokenizer_path path_to_model_hf_dir \
--lora_model path_to_lora \
--interactive
```
- `--model_type {base_model_type}`llamabloomchatglm
- `--base_model {base_model}`HFLLaMA
- `--tokenizer_path {base_model}`HFLLaMA
- `--lora_model {lora_model}`LoRAHF Model HubLoRA
- `--tokenizer_path {tokenizer_path}`tokenizer--base_model
- `--template_name``vicuna``alpaca`vicuna
- `--interactive`
- `--data_file {file_name}`file_namebatch
- `--output_file {file_name}`jsonlfile_name
- `--resize_emb`embeddingembedding
- `--only_cpu`CPU
- `--gpus {gpu_ids}`GPU0GPU0,1,2
####
batch
```shell
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 inference_multigpu_demo.py --model_type baichuan --base_model shibing624/vicuna-baichuan-13b-chat
```