https://github.com/artificialzeng/baichuan2

A series of large language models developed by Baichuan Intelligent Technology
Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:
○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation
Repository

A series of large language models developed by Baichuan Intelligent Technology
Basic Info

Host: GitHub
Owner: ArtificialZeng
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://huggingface.co/baichuan-inc
Size: 4.5 MB
Statistics

Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0
Fork of baichuan-inc/Baichuan2
Created almost 3 years ago · Last pushed almost 3 years ago
https://github.com/ArtificialZeng/Baichuan2/blob/main/






  Baichuan 2




 Hugging Face   ModelScope   WeChat




 [53B](https://www.baichuan-ai.com/) 

[![license](https://img.shields.io/github/license/modelscope/modelscope.svg)](https://github.com/baichuan-inc/Baichuan2/blob/main/LICENSE)



    
         |
        English
    





# 

- [ ](#)
- [ Benchmark  ](#Benchmark-)
- [ ](#)
- [ ](#)
- [  Checkpoints ](#-Checkpoints)
- [ ](#)
- [ ](#)

# 

- Baichuan 2 **** **2.6 **  Tokens 
- Baichuan 2  benchmark ****
-  **7B****13B**  **Base**  **Chat**  Chat  **4bits **
- ****[](#)
-  [Baichuan 2: Open Large-scale Language Models](https://cdn.baichuan-ai.com/paper/Baichuan2-technical-report.pdf) 


|         |   |  |  4bits  |
|:-------:|:-------:|:-------:|:-----------------:|
| 7B      |  [Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base) |  [Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat) |  [Baichuan2-7B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat-4bits) |
| 13B     |  [Baichuan2-13B-Base](https://huggingface.co/baichuan-inc/Baichuan2-13B-Base) |  [Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) |  [Baichuan2-13B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat-4bits) |

# Benchmark 

[](#)[](#)[](#)[](#)[](#)[](#)

## 

 5-shot 
- [C-Eval](https://cevalbenchmark.com/index.html#home)  52  dev  few-shot  test  [Baichuan-7B](https://github.com/baichuan-inc/Baichuan-7B/tree/main) 
- [MMLU](https://arxiv.org/abs/2009.03300)  57  LLM [](https://github.com/hendrycks/test)
- [CMMLU](https://github.com/haonan-li/CMMLU)  67 [](https://github.com/haonan-li/CMMLU)
- [Gaokao](https://github.com/OpenLMLab/GAOKAO-Bench)   C-Eval 
- [AGIEval](https://github.com/microsoft/AGIEval)   C-Eval 
- [BBH](https://huggingface.co/datasets/lukaemon/bbh)  Big-Bench Big-Bench  204 BBH  204  Big-Bench 

### 7B 

|                       | **C-Eval** | **MMLU** | **CMMLU** | **Gaokao** | **AGIEval** | **BBH** |
|:---------------------:|:----------:|:--------:|:---------:|:----------:|:-----------:|:-------:|
|                       |  5-shot    |  5-shot  |  5-shot   | 5-shot     | 5-shot      | 3-shot  |
| **GPT-4**             | 68.40      | 83.93    | 70.33     | 66.15      | 63.27       | 75.12   |
| **GPT-3.5 Turbo**     | 51.10      | 68.54    | 54.06     | 47.07      | 46.13       | 61.59   |
| **LLaMA-7B**          | 27.10      | 35.10    | 26.75     | 27.81      | 28.17       | 32.38   |
| **LLaMA2-7B**         | 28.90      | 45.73    | 31.38     | 25.97      | 26.53       | 39.16   |
| **MPT-7B**            | 27.15      | 27.93    | 26.00     | 26.54      | 24.83       | 35.20   |
| **Falcon-7B**         | 24.23      | 26.03    | 25.66     | 24.24      | 24.10       | 28.77   |
| **ChatGLM2-6B**       | 50.20      | 45.90    | 49.00     | 49.44      | 45.28       | 31.65   |
| **Baichuan-7B**       | 42.80      | 42.30    | 44.02     | 36.34      | 34.44       | 32.48   |
| **Baichuan2-7B-Base** | 54.00      | 54.16    | 57.07     | 47.47      | 42.73       | 41.56   |

### 13B 

|                             | **C-Eval** | **MMLU** | **CMMLU** | **Gaokao** | **AGIEval** | **BBH** |
|:---------------------------:|:----------:|:--------:|:---------:|:----------:|:-----------:|:-------:|
|                             |  5-shot    |  5-shot  |  5-shot   | 5-shot     | 5-shot      | 3-shot  |
| **GPT-4**                   | 68.40      | 83.93    | 70.33     | 66.15      | 63.27       | 75.12   |
| **GPT-3.5 Turbo**           | 51.10      | 68.54    | 54.06     | 47.07      | 46.13       | 61.59   |
| **LLaMA-13B**               | 28.50      | 46.30    | 31.15     | 28.23      | 28.22       | 37.89   |
| **LLaMA2-13B**              | 35.80      | 55.09    | 37.99     | 30.83      | 32.29       | 46.98   |
| **Vicuna-13B**              | 32.80      | 52.00    | 36.28     | 30.11      | 31.55       | 43.04   |
| **Chinese-Alpaca-Plus-13B** | 38.80      | 43.90    | 33.43     | 34.78      | 35.46       | 28.94   |
| **XVERSE-13B**              | 53.70      | 55.21    | 58.44     | 44.69      | 42.54       | 38.06   |
| **Baichuan-13B-Base**       | 52.40      | 51.60    | 55.30     | 49.69      | 43.20       | 43.01   |
| **Baichuan2-13B-Base**      | 58.10      | 59.17    | 61.97     | 54.33      | 48.17       | 48.78   |

## 

 [JEC-QA](https://jecqa.thunlp.org/) JEC-QA  C-Eval 

C-EvalMMLUCMMLU[MedQA](https://arxiv.org/abs/2009.13081)  [MedMCQA](https://medmcqa.github.io/) C-Eval 
-  C-Eval  val 
- MedQA  [MedQA](https://huggingface.co/datasets/bigbio/med_qa)  USMLE  MCMLE 
- MedMCQA  test  dev 
- 
    - C-Eval: clinical_medicine, basic_medicine
    - MMLU: clinical_knowledge, anatomy, college_medicine, college_biology, nutrition, virology, medical_genetics, professional_medicine
    - CMMLU: anatomy, clinical_knowledge, college_medicine, genetics, nutrition, traditional_chinese_medicine, virology 

 5-shot 

### 7B 

|                       | **JEC-QA** | **CEval-MMLU-CMMLU** | **MedQA-USMLE** | **MedQA-MCMLE** | **MedMCQA** |
|:---------------------:|:----------:|:--------------------:|:---------------:|:---------------:|:-----------:|
|                       | 5-shot     | 5-shot               | 5-shot          | 5-shot          | 5-shot      |
| **GPT-4**             | 59.32      | 77.16                | 80.28           | 74.58           | 72.51       |
| **GPT-3.5 Turbo**     | 42.31      | 61.17                | 53.81           | 52.92           | 56.25       |
| **LLaMA-7B**          | 27.45      | 33.34                | 24.12           | 21.72           | 27.45       |
| **LLaMA2-7B**         | 29.20      | 36.75                | 27.49           | 24.78           | 37.93       |
| **MPT-7B**            | 27.45      | 26.67                | 16.97           | 19.79           | 31.96       |
| **Falcon-7B**         | 23.66      | 25.33                | 21.29           | 18.07           | 33.88       |
| **ChatGLM2-6B**       | 40.76      | 44.54                | 26.24           | 45.53           | 30.22       |
| **Baichuan-7B**       | 34.64      | 42.37                | 27.42           | 39.46           | 31.39       |
| **Baichuan2-7B-Base** | 44.46      | 56.39                | 32.68           | 54.93           | 41.73       |

### 13B 

|                             | **JEC-QA** | **CEval-MMLU-CMMLU** | **MedQA-USMLE** | **MedQA-MCMLE** | **MedMCQA** |
|:---------------------------:|:----------:|:--------------------:|:---------------:|:---------------:|:-----------:|
|                             | 5-shot     |  5-shot              |  5-shot         |  5-shot         | 5-shot      |
| **GPT-4**                   | 59.32      | 77.16                | 80.28           | 74.58           | 72.51       |
| **GPT-3.5 Turbo**           | 42.31      | 61.17                | 53.81           | 52.92           | 56.25       |
| **LLaMA-13B**               | 27.54      | 35.14                | 28.83           | 23.38           | 39.52       |
| **LLaMA2-13B**              | 34.08      | 47.42                | 35.04           | 29.74           | 42.12       |
| **Vicuna-13B**              | 28.38      | 40.99                | 34.80           | 27.67           | 40.66       |
| **Chinese-Alpaca-Plus-13B** | 35.32      | 46.31                | 27.49           | 32.66           | 35.87       |
| **XVERSE-13B**              | 46.42      | 58.08                | 32.99           | 58.76           | 41.34       |
| **Baichuan-13B-Base**       | 41.34      | 51.77                | 29.07           | 43.67           | 39.60       |
| **Baichuan2-13B-Base**      | 47.40      | 59.33                | 40.38           | 61.62           | 42.86       |

## 

 [OpenCompass](https://opencompass.org.cn/)  [GSM8K](https://huggingface.co/datasets/gsm8k)  [MATH](https://huggingface.co/datasets/competition_math)  4-shot 

- GSM8K  OpenAI  8.5K 
- MATH  12,500  7500 5000  AMC 10AMC 12AIME 

 [HumanEval](https://huggingface.co/datasets/openai_humaneval)  [MBPP](https://huggingface.co/datasets/mbpp)  OpenCompass HumanEval  0-shot MBPP  3-shot 
- HumanEval 
- MBPP  974  Python 

### 7B 

|                       | **GSM8K** | **MATH** | **HumanEval** | **MBPP** |
|:---------------------:|:---------:|:--------:|:-------------:|:--------:|
|                       |  4-shot   | 4-shot   |  0-shot       |  3-shot  |
| **GPT-4**             |   89.99   | 40.20    | 69.51         |  63.60   |
| **GPT-3.5 Turbo**     |   57.77   | 13.96    | 52.44         |  61.40   |
| **LLaMA-7B**          |   9.78    | 3.02     | 11.59         |  14.00   |
| **LLaMA2-7B**         |   16.22   | 3.24     | 12.80         |  14.80   |
| **MPT-7B**            |   8.64    | 2.90     | 14.02         |  23.40   |
| **Falcon-7B**         |   5.46    | 1.68     | -             |  10.20   |
| **ChatGLM2-6B**       |   28.89   | 6.40     | 9.15          |   9.00   |
| **Baichuan-7B**       |   9.17    | 2.54     | 9.20          |   6.60   |
| **Baichuan2-7B-Base** |   24.49   | 5.58     | 18.29         |  24.20   |

### 13B 

|                             | **GSM8K** | **MATH** | **HumanEval** | **MBPP** |
|:---------------------------:|:---------:|:--------:|:-------------:|:--------:|
|                             |  4-shot   | 4-shot   |  0-shot       |  3-shot  |
| **GPT-4**                   |   89.99   | 40.20    | 69.51         |  63.60   |
| **GPT-3.5 Turbo**           |   57.77   | 13.96    | 52.44         |  61.40   |
| **LLaMA-13B**               |   20.55   | 3.68     | 15.24         |  21.40   |
| **LLaMA2-13B**              |   28.89   | 4.96     | 15.24         |  27.00   |
| **Vicuna-13B**              |   28.13   | 4.36     | 16.46         |  15.00   |
| **Chinese-Alpaca-Plus-13B** |   11.98   | 2.50     | 16.46         |  20.00   |
| **XVERSE-13B**              |   18.20   | 2.18     | 15.85         |  16.80   |
| **Baichuan-13B-Base**       |   26.76   | 4.84     | 11.59         |  22.80   |
| **Baichuan2-13B-Base**      |   52.77   | 10.08    | 17.07         |  30.20   |

## 

 [Flores-101](https://huggingface.co/datasets/facebook/flores) Flores-101  101  OpenCompass  Flores-101 ------- 8-shot 

### 7B 

|             | **CN-EN** | **CN-FR** | **CN-ES** | **CN-AR** | **CN-RU** | **CN-JP** | **CN-DE** | Average |
|:---------------------:|:-------:|:-------:|:---------:|:---------:|:-------:|:-------:|:-------:|:-------:|
| **GPT-4**             | 29.94   | 29.56   | 20.01     | 10.76     | 18.62   | 13.26   | 20.83   | 20.43   |
| **GPT-3.5 Turbo**     | 27.67   | 26.15   | 19.58     | 10.73     | 17.45   | 1.82    | 19.70   | 17.59   |
| **LLaMA-7B**          | 17.27   | 12.02   | 9.54      | 0.00      | 4.47    | 1.41    | 8.73    | 7.63    |
| **LLaMA2-7B**         | 25.76   | 15.14   | 11.92     | 0.79      | 4.99    | 2.20    | 10.15   | 10.14   |
| **MPT-7B**            | 20.77   | 9.53    | 8.96      | 0.10      | 3.54    | 2.91    | 6.54    | 7.48    |
| **Falcon-7B**         | 22.13   | 15.67   | 9.28      | 0.11      | 1.35    | 0.41    | 6.41    | 7.91    |
| **ChatGLM2-6B**       | 22.28   | 9.42    | 7.77      | 0.64      | 1.78    | 0.26    | 4.61    | 6.68    |
| **Baichuan-7B**       | 25.07   | 16.51   | 12.72     | 0.41      | 6.66    | 2.24    | 9.86    | 10.50   |
| **Baichuan2-7B-Base** | 27.27   | 20.87   | 16.17     | 1.39      | 11.21   | 3.11    | 12.76   | 13.25   |

### 13B 

|                   | **CN-EN** | **CN-FR** | **CN-ES** | **CN-AR** | **CN-RU** | **CN-JP** | **CN-DE** | Average |
|:---------------------------:|:-------:|:-------:|:---------:|:---------:|:-------:|:-------:|:-------:|:-------:|
|          **GPT-4**          | 29.94   | 29.56   | 20.01     | 10.76     | 18.62   | 13.26   | 20.83   | 20.43   |
|      **GPT-3.5 Turbo**      | 27.67   | 26.15   | 19.58     | 10.73     | 17.45   | 1.82    | 19.70   | 17.59   |
|        **LLaMA-13B**        | 21.75   | 16.16   | 13.29     | 0.58      | 7.61    | 0.41    | 10.66   | 10.07   |
|       **LLaMA2-13B**        | 25.44   | 19.25   | 17.49     | 1.38      | 10.34   | 0.13    | 11.13   | 12.17   |
|       **Vicuna-13B**        | 22.63   | 18.04   | 14.67     | 0.70      | 9.27    | 3.59    | 10.25   | 11.31   |
| **Chinese-Alpaca-Plus-13B** | 22.53   | 13.82   | 11.29     | 0.28      | 1.52    | 0.31    | 8.13    | 8.27    |
|       **XVERSE-13B**        | 29.26   | 24.03   | 16.67     | 2.78      | 11.61   | 3.08    | 14.26   | 14.53   |
|    **Baichuan-13B-Base**    | 30.24   | 20.90   | 15.92     | 0.98      | 9.65    | 2.64    | 12.00   | 13.19   |
|    **Baichuan2-13B-Base**   | 30.61   | 22.11   | 17.27     | 2.39      | 14.17   | 11.58   | 14.53   | 16.09   |

# 

 Hugging Face Hugging Face 

## 
```shell
pip install -r requirements.txt
```

## Python 

### Chat 
```python
>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers.generation.utils import GenerationConfig
>>> tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan2-13B-Chat", use_fast=False, trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-13B-Chat", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
>>> model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan2-13B-Chat")
>>> messages = []
>>> messages.append({"role": "user", "content": ""})
>>> response = model.chat(tokenizer, messages)
>>> print(response)
""


```

### Base 
```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan2-13B-Base", trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-13B-Base", device_map="auto", trust_remote_code=True)
>>> inputs = tokenizer('->\n->', return_tensors='pt')
>>> inputs = inputs.to('cuda:0')
>>> pred = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
>>> print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
->
->
```

>  `device_map='auto'` `export CUDA_VISIBLE_DEVICES=0,1`01

## 

```shell
python cli_demo.py
```
 Chat  Base 

##  demo 

 streamlit  web  demo  Chat  Base 

```shell
streamlit run web_demo.py
```

## Baichuan2-13B-Chat 



`````
[3.1,6.2,1.3,8.4,10.5,11.6,2.1]python

Baichuan 2def find_median(numbers):
    numbers.sort()
    length = len(numbers)
    if length % 2 == 0:
        return (numbers[length // 2 - 1] + numbers[length // 2]) / 2
    else:
        return numbers[length // 2]
numbers = [3.1, 6.2, 1.3, 8.4, 10.5, 11.6, 2.1]
print(find_median(numbers))
`````





`````
2668

Baichuan 2xy

1. x + y = 26 ()
2. 2x + 4y = 68 ()

x

x = 26 - y

x

2(26 - y) + 4y = 68
52 - 2y + 4y = 68
2y = 16

y

y = 16 / 2
y = 8

8yx

x = 26 - 8
x = 18

188
`````





`````


Baichuan 2I'm ready to translate every word you say next into English.



Baichuan 2Hello

Baichuan 2

Baichuan 2I am Baichuan 2, and I am very happy to meet everyone.

AGI

Baichuan 2I hope through the joint efforts of all of us, we can welcome the arrival of the AGI era as soon as possible.
`````




`````
?A.B.:C.:D.1000m/s

Baichuan 2
AA BB CC D340m/s1000m/sD  D 
`````
`````
I'm Mike I am going to have a busy weekendOn SaturdayI am going to learn how to swimI will go with my fatherThen we are going to have lunch in the restaurantIn the afternoonwe are going to the zooWe are going to see the pandasOn SundayI will finish my homework in the morningIn the afternoonmy parents and I are going to visit my grandparentsWe will have dinner together
TF:
1.Mike is going to have a busy weekend()
2.Mike is going to learn how to swim with his father()
3.They are going to have lunch at home()

Baichuan 21. T
2. T
3. F
`````



## 

 Baichuan 2  Baichuan 2  Baichuan2-7B-Chat  Baichuan2-13B-Chat Baichuan 2 

### 

Baichuan 2 [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) transformers BitsAndBytes  8bits  4bits  4bits  FP4  NF4 Baichuan 2  NF4  4bits   
  
Baichuan 2 

### 

 8bits  4bits  [Baichuan-13B](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)  CPU `quantize()` `cuda()` GPU  Baichuan2-7B-Chat 

8bits :
```python
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-7B-Chat", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda() 
```
4bits :
```python
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-7B-Chat", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda() 
```
 `from_pretrained`  `device_map="auto"`

### 

 4bits  [Baichuan2-7B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat-4bits/tree/main)
 Baichuan2-7B-Chat-4bits :
```python
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-7B-Chat-4bits", device_map="auto", trust_remote_code=True)
```
 8bits  Hugging Face transformers  API  8bits  8bits 
```python
# Model saving: model_id is the original model directory, and quant8_saved_dir is the directory where the 8bits quantized model is saved.
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map="auto", trust_remote_code=True)
model.save_pretrained(quant8_saved_dir)
model = AutoModelForCausalLM.from_pretrained(quant8_saved_dir, device_map="auto", trust_remote_code=True)
```

### 

 (GPU Mem in GB)
| Precision   | Baichuan2-7B |Baichuan2-13B |
|-------------|:------------:|:------------:|
| bf16 / fp16 | 15.3         | 27.5         |
| 8bits       | 8.0          | 16.1         |
| 4bits       | 5.1          | 8.6          |

 benchmark 

| Model 5-shot           | C-Eval | MMLU | CMMLU |
|------------------------|:------:|:----:|:-----:|
| Baichuan2-13B-Chat      | 56.74  | 57.32| 59.68  |
| Baichuan2-13B-Chat-4bits | 56.05   | 56.24 | 58.82  |
| Baichuan2-7B-Chat       | 54.35   | 52.93 | 54.99  |
| Baichuan2-7B-Chat-4bits | 53.04   | 51.72 | 52.84  |
> C-Eval  val set 

4bits  bfloat16  1 - 2 

## CPU 

Baichuan 2  CPU CPU 
```python
# Taking Baichuan2-7B-Chat as an example
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-7B-Chat", torch_dtype=torch.float32, trust_remote_code=True)
```
##  Baichuan 1  Baichuan 2

 Baichuan 1 (Baichuan-7B, Baichuan-13B) Baichuan 2 Baichuan 2  Baichuan 1  Baichuan 2  lm_head `lm_head.weight` Baichuan 1 
```python
import torch
import os
ori_model_dir = 'your Baichuan 2 model directory'
# To avoid overwriting the original model, it's best to save the converted model to another directory before replacing it
new_model_dir = 'your normalized lm_head weight Baichuan 2 model directory'
model = torch.load(os.path.join(ori_model_dir, 'pytorch_model.bin'))
lm_head_w = model['lm_head.weight']
lm_head_w = torch.nn.functional.normalize(lm_head_w)
model['lm_head.weight'] = lm_head_w
torch.save(model, os.path.join(new_model_dir, 'pytorch_model.bin'))
```

# 

## 

```shell
git clone https://github.com/baichuan-inc/Baichuan2.git
cd Baichuan2/fine-tune
pip install -r requirements.txt
```
-  LoRA  [peft](https://github.com/huggingface/peft)
-  xFormers  [xFormers](https://github.com/facebookresearch/xformers)

## 

 Baichuan2-7B-Base 

`data/belle_chat_ramdon_10k.json` [multiturn_chat_0.8M](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M)  1 

```shell
hostfile=""
deepspeed --hostfile=$hostfile fine-tune.py  \
    --report_to "none" \
    --data_path "data/belle_chat_ramdon_10k.json" \
    --model_name_or_path "baichuan-inc/Baichuan2-7B-Base" \
    --output_dir "output" \
    --model_max_length 512 \
    --num_train_epochs 4 \
    --per_device_train_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --save_strategy epoch \
    --learning_rate 2e-5 \
    --lr_scheduler_type constant \
    --adam_beta1 0.9 \
    --adam_beta2 0.98 \
    --adam_epsilon 1e-8 \
    --max_grad_norm 1.0 \
    --weight_decay 1e-4 \
    --warmup_ratio 0.0 \
    --logging_steps 1 \
    --gradient_checkpointing True \
    --deepspeed ds_config.json \
    --bf16 True \
    --tf32 True
```

## 

 hostfile 
```
ip1 slots=8
ip2 slots=8
ip3 slots=8
ip4 slots=8
....
```
 hosftfile 
```shell
hostfile="/path/to/hostfile"
deepspeed --hostfile=$hostfile fine-tune.py  \
    --report_to "none" \
    --data_path "data/belle_chat_ramdon_10k.json" \
    --model_name_or_path "baichuan-inc/Baichuan2-7B-Base" \
    --output_dir "output" \
    --model_max_length 512 \
    --num_train_epochs 4 \
    --per_device_train_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --save_strategy epoch \
    --learning_rate 2e-5 \
    --lr_scheduler_type constant \
    --adam_beta1 0.9 \
    --adam_beta2 0.98 \
    --adam_epsilon 1e-8 \
    --max_grad_norm 1.0 \
    --weight_decay 1e-4 \
    --warmup_ratio 0.0 \
    --logging_steps 1 \
    --gradient_checkpointing True \
    --deepspeed ds_config.json \
    --bf16 True \
    --tf32 True
```

## 

 LoRA
```shell
--use_lora True
```
LoRA  `fine-tune.py` 

 LoRA 
```python
from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained("output", trust_remote_code=True)
```

#  Checkpoints

 2.6  Tokens  Baichuan2-7B-Base  11  checkpoints 0.2 ~ 2.4  Tokens[](https://huggingface.co/baichuan-inc/Baichuan2-7B-Intermediate-Checkpoints) checkpoints  C-EvalMMLUCMMLU  benchmark 





# 

**  Baichuan 2  **

## 

### Pytorch 

Baichuan 2  NPU  PyTorch + DeepSpeed  modelingREADME[Baichuan2-7B](https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/PyTorch/built-in/foundation/Baichuan2/7B)Baichuan2-13B 

Baichuan 2  NPU  modelingREADME[Baichuan2-7B](https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/foundation_models/baichuan2/7b)[Baichuan2-13B](https://gitee.com/ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/foundation_models/baichuan2/13b)

### MindSpore 

[MindFormers]( https://gitee.com/mindspore/mindformers) MindSpore[Baichuan2-7B / 13B]( https://gitee.com/mindspore/mindformers/tree/dev/research/baichuan2)  [README]( https://gitee.com/mindspore/mindformers/tree/dev/research/baichuan2/baichuan2.md)

### 

[](https://xihe.mindspore.cn)  MindSpore AI MindFormers  [Baichuan2-7B](https://xihe.mindspore.cn/modelzoo/baichuan2_7b_chat) 

# 

## 

 Baichuan 2  iOSAndroid Baichuan 2  Baichuan 2 

 Baichuan 2 

## 

 [Apache 2.0](https://github.com/baichuan-inc/Baichuan2/blob/main/LICENSE) Baichuan 2 [Baichuan 2 ](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base/resolve/main/Baichuan%202%E6%A8%A1%E5%9E%8B%E7%A4%BE%E5%8C%BA%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE.pdf)Baichuan 2  Baichuan 2
Owner

Name: Dr. Artificial曾小健
Login: ArtificialZeng
Kind: user
Location: Beijing
Website: https://blog.csdn.net/sinat_37574187?type=blog
Repositories: 171
Profile: https://github.com/ArtificialZeng
LLM practitioner/engineer, AI/ML/DL Quant
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/artificialzeng/baichuan2

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/ArtificialZeng/Baichuan2/blob/main/

Baichuan 2

| English

Owner

GitHub Events

Total

Last Year