https://github.com/artificialzeng/chatglm2-6b

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Last synced: 10 months ago · JSON representation

Repository

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Basic Info

Host: GitHub
Owner: ArtificialZeng
License: other
Language: Python
Default Branch: main
Homepage:
Size: 4.91 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of THUDM/ChatGLM2-6B

Created almost 3 years ago · Last pushed almost 3 years ago

https://github.com/ArtificialZeng/ChatGLM2-6B/blob/main/

# ChatGLM2-6B


 HF Repo   Twitter   [GLM@ACL 22] [GitHub]   [GLM-130B@ICLR 23] [GitHub] 



      Slack  WeChat


*Read this in [English](README_EN.md)*

## 

ChatGLM**2**-6B  [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) ChatGLM**2**-6B 

1. **** ChatGLM  ChatGLM2-6B ChatGLM2-6B  [GLM](https://github.com/THUDM/GLM)  1.4T [](#)ChatGLM2-6B  MMLU+23%CEval+33%GSM8K+571% BBH+60%
2. **** [FlashAttention](https://github.com/HazyResearch/flash-attention) Context Length ChatGLM-6B  2K  32K 8K  ChatGLM2-6B 
3. **** [Multi-Query Attention](http://arxiv.org/abs/1911.02150) ChatGLM2-6B  42%INT4 6G  1K  8K
4. ****ChatGLM2-6B ******** ChatGLM3 

-----

ChatGLM2-6B [](MODEL_LICENSE)** ChatGLM2-6B  iOS  Windows App **

 ChatGLM2-6B ****

## 
 ChatGLM2-6B  [MMLU](https://github.com/hendrycks/test) ()[C-Eval](https://cevalbenchmark.com/static/leaderboard.html)[GSM8K](https://github.com/openai/grade-school-math)[BBH](https://github.com/suzgunmirac/BIG-Bench-Hard)  [evaluation](./evaluation/README.md)  C-Eval 

### MMLU

| Model | Average | STEM | Social Sciences | Humanities | Others |
| ----- | ----- | ---- | ----- | ----- | ----- |
| ChatGLM-6B | 40.63 | 33.89 | 44.84 | 39.02 | 45.71 |
| ChatGLM2-6B (base) | 47.86 | 41.20 | 54.44 | 43.66 | 54.46 |
| ChatGLM2-6B | 45.46 | 40.06 | 51.61 | 41.23 | 51.24 |

> Chat  zero-shot CoT (Chain-of-Thought) Base  few-shot answer-only 

### C-Eval

| Model | Average | STEM | Social Sciences | Humanities | Others |
| ----- | ---- | ---- | ----- | ----- | ----- |
| ChatGLM-6B | 38.9 | 33.3 | 48.3 | 41.3 | 38.0 |
| ChatGLM2-6B (base) | 51.7 | 48.6 | 60.5 | 51.3 | 49.8 |
| ChatGLM2-6B | 50.1 | 46.4	| 60.4 | 50.6 | 46.9 | 

> Chat  zero-shot CoT Base  few-shot answer only 

### GSM8K

| Model | Accuracy | Accuracy (Chinese)* |
| ----- | ----- | ----- |
| ChatGLM-6B | 4.82 | 5.85 |
| ChatGLM2-6B (base) | 32.37 | 28.95 |
| ChatGLM2-6B | 28.05 | 20.45 |

>  few-shot CoT CoT prompt  http://arxiv.org/abs/2201.11903
> 
> \*  API  GSM8K  500  CoT prompt 


### BBH

| Model | Accuracy |
| ----- | ----- |
| ChatGLM-6B | 18.73 |
| ChatGLM2-6B (base) | 33.68 |
| ChatGLM2-6B | 30.00 |

>  few-shot CoT CoT prompt  https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/cot-prompts

## 
ChatGLM2-6B  [Multi-Query Attention](http://arxiv.org/abs/1911.02150) 2000 

| Model |  (/) |
| ----  | -----  |
| ChatGLM-6B  | 31.49 |
| ChatGLM2-6B | 44.62 |

> batch size = 1max length = 2048bf16  A100-SXM4-80G PyTorch 2.0.1

Multi-Query Attention  KV Cache ChatGLM2-6B  Causal Mask  KV Cache 6GB  INT4  ChatGLM-6B  1119  ChatGLM2-6B  8192 

| **** | ** 2048 ** | ** 8192 ** |
| -------------- |---------------------|---------------------|
| FP16 / BF16 | 13.1 GB             | 12.8 GB             | 
| INT8           | 8.2 GB              | 8.1 GB              |
| INT4           | 5.5 GB              | 5.1 GB              |

> ChatGLM2-6B  PyTorch 2.0  `torch.nn.functional.scaled_dot_product_attention`  Attention  PyTorch  fallback  Attention 



|  | Accuracy (MMLU) | Accuracy (C-Eval dev) |
| ----- | ----- |-----------------------|
| BF16 | 45.47 | 53.57                 |
| INT4 | 43.13 | 50.30                 |



## ChatGLM2-6B 

ChatGLM2-6B  ChatGLM2-6B 



![](resources/math.png)





![](resources/knowledge.png)





![](resources/long-context.png)



## 
### 

```shell
git clone https://github.com/THUDM/ChatGLM2-6B
cd ChatGLM2-6B
```

 pip `pip install -r requirements.txt` `transformers`  `4.30.2``torch`  2.0 

###  

 ChatGLM2-6B 

```python
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='cuda')
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "", history=[])
>>> print(response)
! ChatGLM2-6B,,
>>> response, history = model.chat(tokenizer, "", history=history)
>>> print(response)
,:

1. :,,
2. :,,,
3. :,,,,,
4. :,,,
5. :,,,
6. :,,,,

,,
```

#### 
 `transformers`  [Hugging Face Hub](https://huggingface.co/THUDM/chatglm2-6b)

 Hugging Face Hub [Git LFS](https://docs.github.com/zh/repositories/working-with-files/managing-large-files/installing-git-large-file-storage)
```Shell
git clone https://huggingface.co/THUDM/chatglm2-6b
```

 Hugging Face Hub  checkpoint 
```Shell
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm2-6b
```
[](https://cloud.tsinghua.edu.cn/d/674208019e314311ab5c/) `chatglm2-6b` 


 `THUDM/chatglm2-6b`  `chatglm2-6b` 

 `from_pretrained`  `revision="v1.0"` `v1.0`  [Change Log](https://huggingface.co/THUDM/chatglm2-6b#change-log)

###  Demo

![web-demo](resources/web-demo.gif)

 Gradio`pip install gradio` [web_demo.py](web_demo.py) 

```shell
python web_demo.py
```

 Web Server
>  `share=False`  `share=True` 
> 

 [@AdamBear](https://github.com/AdamBear)  Streamlit  Demo `web_demo2.py`
```shell
pip install streamlit streamlit-chat
```

```shell
streamlit run web_demo2.py
```
 prompt  Streamlit  Demo 

###  Demo

![cli-demo](resources/cli-demo.png)

 [cli_demo.py](cli_demo.py)

```shell
python cli_demo.py
```

 `clear`  `stop` 

### API 
 `pip install fastapi uvicorn` [api.py](api.py)
```shell
python api.py
```
 8000  POST 
```shell
curl -X POST "http://127.0.0.1:8000" \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "", "history": []}'
```

```shell
{
  "response":" ChatGLM2-6B",
  "history":[[""," ChatGLM2-6B"]],
  "status":200,
  "time":"2023-03-23 21:38:40"
}
```
 [@hiyouga]()  OpenAI  API  ChatGPT  [ChatGPT-Next-Web](https://github.com/Yidadaa/ChatGPT-Next-Web)[openai_api.py](openai_api.py) 
```shell
python openai_api.py
```
 API 
```python
import openai
if __name__ == "__main__":
    openai.api_base = "http://localhost:8000/v1"
    openai.api_key = "none"
    for chunk in openai.ChatCompletion.create(
        model="chatglm2-6b",
        messages=[
            {"role": "user", "content": ""}
        ],
        stream=True
    ):
        if hasattr(chunk.choices[0].delta, "content"):
            print(chunk.choices[0].delta.content, end="", flush=True)
```


## 

### 

 FP16  13GB  GPU 

```python
#  4/8 bit 
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).quantize(8).cuda()
```

ChatGLM2-6B  4-bit 


```python
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).cuda()
```



### CPU 

 GPU  CPU  32GB 
```python
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).float()
```

```python
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).float()
```
 cpu  `gcc`  `openmp` Linux  Windows  [TDM-GCC](https://jmeubank.github.io/tdm-gcc/)  `openmp` Windows  `gcc`  `TDM-GCC 10.3.0` Linux  `gcc 11.3.0` MacOS  [Q1](FAQ.md#q1)

### Mac 

 Apple Silicon  AMD GPU  Mac MPS  GPU  ChatGLM2-6B Apple  [](https://developer.apple.com/metal/pytorch)  PyTorch-Nightly2.x.x.dev2023xxxx 2.x.x

 MacOS [](README.md#) mps 
```python
model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')
```

 ChatGLM2-6B  13GB  16GB  MacBook Pro
 chatglm2-6b-int4 GPU  kernel  CUDA  MacOS  CPU 
 CPU [ OpenMP](FAQ.md#q1)

### 
 GPU GPU GPU accelerate: `pip install accelerate`
```python
from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm2-6b", num_gpus=2)
```
 GPU  `num_gpus`  GPU  `device_map`  

## 

 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) ChatGLM2-6B  [Model License](MODEL_LICENSE)ChatGLM2-6B ******** ChatGLM3  [yiwen.xu@zhipuai.cn](mailto:yiwen.xu@zhipuai.cn) 


## 

ChatGLM2-6B 

```
@article{zeng2022glm,
  title={Glm-130b: An open bilingual pre-trained model},
  author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
  journal={arXiv preprint arXiv:2210.02414},
  year={2022}
}
```
```
@inproceedings{du2022glm,
  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={320--335},
  year={2022}
}
```

Owner

Name: Dr. Artificial曾小健
Login: ArtificialZeng
Kind: user
Location: Beijing

Website: https://blog.csdn.net/sinat_37574187?type=blog
Repositories: 171
Profile: https://github.com/ArtificialZeng

LLM practitioner/engineer, AI/ML/DL Quant

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/artificialzeng/chatglm2-6b

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/ArtificialZeng/ChatGLM2-6B/blob/main/

Owner

GitHub Events

Total

Last Year