https://github.com/chapzq77/chinese-bert-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)

https://github.com/chapzq77/chinese-bert-wwm

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (2.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)

Basic Info
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of ymcui/Chinese-BERT-wwm
Created over 6 years ago · Last pushed over 6 years ago

https://github.com/chapzq77/Chinese-BERT-wwm/blob/master/

[****](https://github.com/ymcui/Chinese-BERT-wwm/) | [**English**](https://github.com/ymcui/Chinese-BERT-wwm/blob/master/README_EN.md)

## BERT-wwmPre-Trained Chinese BERT with Whole Word Masking
Pre-trained Models
Whole Word MaskingBERT-wwmBERT-wwm-extRoBERTa-wwm-extRoBERTa-wwm-large-ext
[BERT](https://github.com/google-research/bert)[ERNIE](https://github.com/PaddlePaddle/ERNIE/blob/develop/README.zh.md)[BERT-wwm](https://github.com/ymcui/Chinese-BERT-wwm)
**https://arxiv.org/abs/1906.08101**

![./pics/header.png](https://github.com/ymcui/Chinese-BERT-wwm/raw/master/pics/header.png)

****

- https://mp.weixin.qq.com/s/EE6dEhvpKxqnVW_bBAKrnA
- https://mp.weixin.qq.com/s/88OwaHqnrVMQ7vH98INA3w

BERThttps://github.com/google-research/bert


## 
**2019/10/14 RoBERTa-wwm-large-ext[](#)**

2019/9/10 RoBERTa-wwm-ext[](#)

2019/7/30 5.4B`BERT-wwm-ext`[](#)

2019/6/20 [](#)


## 
|  |  |
|-|-|
| [](#) | BERT-wwm |
| [](#) | BERT-wwm |
| [](#) |  |
| [](#) |  |
| [](#) |  |
| [](#) | BERT-wwm |
| [FAQ](#FAQ) |  |
| [](#) |  |


## 
**Whole Word Masking (wwm)**`Mask``Mask`2019531BERT
WordPiecemask
`Mask`WordPiecemaskmask`Mask`

**maskmask[MASK]`[MASK]`
[#4](https://github.com/ymcui/Chinese-BERT-wwm/issues/4)**

`BERT-base, Chinese`****NLPCWS
Mask[LTP](http://ltp.ai)****Mask

`Mask`
**[MASK]**

|  |  |
| :------- | :--------- |
|  | probability |
|  |          probability  |
| Mask |     [MASK]   [MASK]       pro [MASK] ##lity  |
| Mask |     [MASK] [MASK]  [MASK] [MASK]      [MASK] [MASK] [MASK]  |


## 
base`base`large

* **`BERT-large`**24-layer, 1024-hidden, 16-heads, 330M parameters  
* **`BERT-base`**12-layer, 768-hidden, 12-heads, 110M parameters  

|  |  | Google |  |
| :------- | :--------- | :---------: | :---------: |
| **`RoBERTa-wwm-large-ext, Chinese`** | **+
[1]** | **[TensorFlow](https://drive.google.com/open?id=1dtad0FFzG11CBsawu8hvwwzU2R0FDI94)**
**[PyTorch](https://drive.google.com/open?id=1-2vEZfIFCdM1-vJ3GD6DlSyKT4eVXMKq)** | **[TensorFlowu6gC](https://pan.iflytek.com:443/link/AC056611607108F33A744A0F56D0F6BE)**
**[PyTorch43eH](https://pan.iflytek.com:443/link/9B46A0ABA70C568AAAFCD004B9A2C773)** | | **`RoBERTa-wwm-ext, Chinese`** | **+
[1]** | **[TensorFlow](https://drive.google.com/open?id=1jMAKIJmPn7kADgD3yQZhpsqM-IRM1qZt)**
**[PyTorch](https://drive.google.com/open?id=1eHM3l4fMo6DsQYGmey7UZGiTmQquHw25)** | **[TensorFlowpeMe](https://pan.iflytek.com:443/link/A136858D5F529E7C385C73EEE336F27B)**
**[PyTorch6kpJ](https://pan.iflytek.com:443/link/2F25AD577CC47EA9CCFC3A038AF29429)**| | **`BERT-wwm-ext, Chinese`** | **+
[1]** | **[TensorFlow](https://drive.google.com/open?id=1buMLEjdtrXE2c4G1rpsNGWEx7lUQ0RHi)**
**[PyTorch](https://drive.google.com/open?id=1iNeYFhCBJWeUsIlnW_2K6SMwXkM4gLb_)** | **[TensorFlowthGd](https://pan.iflytek.com:443/link/8AA4B23D9BCBCBA0187EE58234332B46)**
**[PyTorchbJns](https://pan.iflytek.com:443/link/4AB35DEBECB79C578BEC9952F78FB6F2)** | | **`BERT-wwm, Chinese`** | **** | **[TensorFlow](https://drive.google.com/open?id=1RoTQsXp2hkQ1gSRVylRIJfQxJUgkfJMW)**
**[PyTorch](https://drive.google.com/open?id=1AQitrjbvCWc51SYiLN-cJq4e0WiNN4KY)** | **[TensorFlowmva8](https://pan.iflytek.com:443/link/4B172939D5748FB1A3881772BC97A898)**
**[PyTorch8fX5](https://pan.iflytek.com:443/link/8D4E8680433E6AD0F33D521EA920348E)** | | `BERT-base, Chinese`Google | | [Google Cloud](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip) | - | | `BERT-base, Multilingual Cased`Google | | [Google Cloud](https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip) | - | | `BERT-base, Multilingual Uncased`Google | | [Google Cloud](https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip) | - | > [1] 5.4B10G TensorFlow PyTorchHuggingface[PyTorch-Transformers 1.0](https://github.com/huggingface/pytorch-transformers) base**400M** TensorFlow`BERT-wwm, Chinese`zip ``` chinese_wwm_L-12_H-768_A-12.zip |- bert_model.ckpt # |- bert_model.meta # meta |- bert_model.index # index |- bert_config.json # |- vocab.txt # ``` `bert_config.json``vocab.txt``BERT-base, Chinese` PyTorch`pytorch_model.bin`, `bert_config.json`, `vocab.txt` ### `data` `README.md` ## | - | BERTGoogle | BERT-wwm | BERT-wwm-ext | RoBERTa-wwm-ext | RoBERTa-wwm-large-ext | | :------- | :---------: | :---------: | :---------: | :---------: | :---------: | | Masking | WordPiece | WWM[1] | WWM | WWM | WWM | | Type | BERT-base | BERT-base | BERT-base | BERT-base | **BERT-large** | | Data Source | wiki | wiki | wiki+ext[2] | wiki+ext | wiki+ext | | Training Tokens # | 0.4B | 0.4B | 5.4B | 5.4B | 5.4B | | Device | TPU Pod v2 | TPU v3 | TPU v3 | TPU v3 | **TPU Pod v3-32[3]** | | Training Steps | ? | 100KMAX128
+100KMAX512 | 1MMAX128
+400KMAX512 | 1MMAX512 | 2MMAX512 | | Batch Size | ? | 2,560 / 384 | 2,560 / 384 | 384 | 512 | | Optimizer | AdamW | LAMB | LAMB | AdamW | AdamW | | Vocabulary | 21,128 | ~BERT[4] vocab | ~BERT vocab | ~BERT vocab | ~BERT vocab | | Init Checkpoint | Random Init | ~BERT weight | ~BERT weight | ~BERT weight | Random Init | > [1] WWM = Whole Word Masking > [2] ext = extended data > [3] TPU Pod v3-32 (512G HBM)4TPU v3 (128G HBM) > [4] `~BERT`****BERT ## ```` `BERT-wwm-ext``RoBERTa-wwm-ext``RoBERTa-wwm-large-ext`****`BERT-wwm` **[](https://arxiv.org/abs/1906.08101)** - [**CMRC 2018**](https://github.com/ymcui/cmrc2018) - [**DRCD**](https://github.com/DRCSolutionService/DRCD) - [**CJRC**: ](http://cail.cipsc.org.cn) - [**XNLI**](https://github.com/google-research/bert/blob/master/multilingual.md) - [**LCQMC**](http://icrc.hitsz.edu.cn/info/1037/1146.htm) - [**BQ Corpus**](http://icrc.hitsz.edu.cn/Article/show/175.html) - [**NER**](http://sighan.cs.uchicago.edu/bakeoff2006/) - [**THUCNews**](http://thuctc.thunlp.org) **10** ### CMRC 2018 [**CMRC 2018**](https://github.com/ymcui/cmrc2018) SQuAD | | | | | | :------- | :---------: | :---------: | :---------: | | BERT | 65.5 (64.4) / 84.5 (84.0) | 70.0 (68.7) / 87.0 (86.3) | 18.6 (17.0) / 43.3 (41.3) | | ERNIE | 65.4 (64.3) / 84.7 (84.2) | 69.4 (68.2) / 86.6 (86.1) | 19.6 (17.0) / 44.3 (42.8) | | **BERT-wwm** | 66.3 (65.0) / 85.6 (84.7) | 70.5 (69.1) / 87.4 (86.7) | 21.0 (19.3) / 47.0 (43.9) | | **BERT-wwm-ext** | 67.1 (65.6) / 85.7 (85.0) | 71.4 (70.0) / 87.7 (87.0) | 24.0 (20.0) / 47.3 (44.6) | | **RoBERTa-wwm-ext** | 67.4 (66.5) / 87.2 (86.5) | 72.6 (71.4) / 89.4 (88.8) | 26.2 (24.6) / 51.0 (49.1) | | **RoBERTa-wwm-large-ext** | **68.5 (67.6) / 88.4 (87.9)** | **74.2 (72.4) / 90.6 (90.0)** | **31.5 (30.1) / 60.1 (57.5)** | ### DRCD [**DRCD**](https://github.com/DRCKnowledgeTeam/DRCD)SQuAD **ERNIEERNIE** | | | | | :------- | :---------: | :---------: | | BERT | 83.1 (82.7) / 89.9 (89.6) | 82.2 (81.6) / 89.2 (88.8) | | ERNIE | 73.2 (73.0) / 83.9 (83.8) | 71.9 (71.4) / 82.5 (82.3) | | **BERT-wwm** | 84.3 (83.4) / 90.5 (90.2) | 82.8 (81.8) / 89.7 (89.0) | | **BERT-wwm-ext** | 85.0 (84.5) / 91.2 (90.9) | 83.6 (83.0) / 90.4 (89.9) | | **RoBERTa-wwm-ext** | 86.6 (85.9) / 92.5 (92.2) | 85.6 (85.2) / 92.0 (91.7) | | **RoBERTa-wwm-large-ext** | **89.6 (89.1) / 94.8 (94.4)** | **89.6 (88.9) / 94.5 (94.1)** | ### CJRC [**CJRC**](http://cail.cipsc.org.cn)**** | | | | | :------- | :---------: | :---------: | | BERT | 54.6 (54.0) / 75.4 (74.5) | 55.1 (54.1) / 75.2 (74.3) | | ERNIE | 54.3 (53.9) / 75.3 (74.6) | 55.0 (53.9) / 75.0 (73.9) | | **BERT-wwm** | 54.7 (54.0) / 75.2 (74.8) | 55.1 (54.1) / 75.4 (74.4) | | **BERT-wwm-ext** | 55.6 (54.8) / 76.0 (75.3) | 55.6 (54.9) / 75.8 (75.0) | | **RoBERTa-wwm-ext** | 58.7 (57.6) / 79.1 (78.3) | 59.0 (57.8) / 79.0 (78.0) | | **RoBERTa-wwm-large-ext** | **62.1 (61.1) / 82.4 (81.6)** | **62.4 (61.4) / 82.2 (81.0)** | ### XNLI [**XNLI**](https://github.com/google-research/bert/blob/master/multilingual.md)`entailment``neutral``contradictory` | | | | | :------- | :---------: | :---------: | | BERT | 77.8 (77.4) | 77.8 (77.5) | | ERNIE | 79.7 (79.4) | 78.6 (78.2) | | **BERT-wwm** | 79.0 (78.4) | 78.2 (78.0) | | **BERT-wwm-ext** | 79.4 (78.6) | 78.7 (78.3) | | **RoBERTa-wwm-ext** | 80.0 (79.2) | 78.8 (78.3) | | **RoBERTa-wwm-large-ext** | **82.1 (81.3)** | **81.2 (80.6)** | ### LCQMC, BQ Corpus #### LCQMC [LCQMC](http://icrc.hitsz.edu.cn/info/1037/1146.htm) | | | | | :------- | :---------: | :---------: | | BERT | 89.4 (88.4) | 86.9 (86.4) | | ERNIE | 89.8 (89.6) | **87.2 (87.0)** | | **BERT-wwm** | 89.4 (89.2) | 87.0 (86.8) | | **BERT-wwm-ext** | 89.6 (89.2) | 87.1 (86.6) | | **RoBERTa-wwm-ext** | 89.0 (88.7) | 86.4 (86.1) | | **RoBERTa-wwm-large-ext** | **90.4 (90.0)** | 87.0 (86.8) | #### BQ Corpus [BQ Corpus](http://icrc.hitsz.edu.cn/Article/show/175.html) | | | | | :------- | :---------: | :---------: | | BERT | 86.0 (85.5) | 84.8 (84.6) | | ERNIE | 86.3 (85.5) | 85.0 (84.6) | | **BERT-wwm** | 86.1 (85.6) | 85.2 **(84.9)** | | **BERT-wwm-ext** | **86.4** (85.5) | 85.3 (84.8) | | **RoBERTa-wwm-ext** | 86.0 (85.4) | 85.0 (84.6) | | **RoBERTa-wwm-large-ext** | 86.3 **(85.7)** | **85.8 (84.9)** |
### MSRA-NER NER******NER** F *()* | | | MSRA-NER | | :------- | :---------: | :---------: | | BERT | 95.2 (94.9) | 95.3 (94.9) | | ERNIE | **95.7 (94.5)** | **95.4 (95.1)** | | **BERT-wwm** | 95.3 (95.1) | **95.4 (95.1)** | ### THUCNews **THUCNews** 10 | | | | | :------- | :---------: | :---------: | | BERT | 97.7 (97.4) | **97.8 (97.6)** | | ERNIE | 97.6 (97.3) | 97.5 (97.3) | | **BERT-wwm** | **98.0 (97.6)** | **97.8 (97.6)** |
## * `BERT` * `ERNIE``BERT`/`BERT-wwm``ERNIE``ERNIE` * `BERT`/`BERT-wwm``ERNIE` * `BERT``BERT-wwm` * * `BERT``BERT-wwm``ERNIE` ## ****`BERT-large (wwm)` * **[`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/bert_models/2019_05_30/wwm_uncased_L-24_H-1024_A-16.zip)**: 24-layer, 1024-hidden, 16-heads, 340M parameters * **[`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/bert_models/2019_05_30/wwm_cased_L-24_H-1024_A-16.zip)**: 24-layer, 1024-hidden, 16-heads, 340M parameters ## FAQ **Q: ** A: BERT **wwm** **Q: ** A: [#10](https://github.com/ymcui/Chinese-BERT-wwm/issues/10) [#13](https://github.com/ymcui/Chinese-BERT-wwm/issues/13) **Q: ** A: data **Q: BERT-large-wwm** A: **Q: ** A: `run_classifier.py` bug batch sizeBERTXLNetIssue **Q: ** A: **Q: ** A: TPU v3128G HBMBERT-wwm1.5BERT-wwm-ext `LAMB Optimizer`[TensorFlow](https://github.com/ymcui/LAMB_Optimizer_TF)batch BERT`AdamWeightDecayOptimizer` **Q: ERNIE** A: ERNIE[ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE)ACL 2019[ERNIE](https://github.com/thunlp/ERNIE) **Q: BERT-wwm** A: BERTERNIEBERT-wwm **Q: ** A: 1 2 3 **Q: ** A: **Q: ** A: ZOEZOE: Zero-shOt Embeddings from language model **Q: `RoBERTa-wwm-ext`** A: RoBERTaBERT-wwm : 1wwmmaskdynamic masking 2Next Sentence PredictionNSPloss 3max_len=128max_len=512max_len=512 4 RoBERTaRoBERTaBERTRoBERTa-like BERT BERTRoBERTa ## https://arxiv.org/abs/1906.08101 ``` @article{chinese-bert-wwm, title={Pre-Training with Whole Word Masking for Chinese BERT}, author={Cui, Yiming and Che, Wanxiang and Liu, Ting and Qin, Bing and Yang, Ziqing and Wang, Shijin and Hu, Guoping}, journal={arXiv preprint arXiv:1906.08101}, year={2019} } ``` ## [**TensorFlow Research Cloud**](https://www.tensorflow.org/tfrc) ## **Chinese BERT-wwm** **** ## ![qrcode.png](https://github.com/ymcui/cmrc2019/raw/master/qrcode.jpg) ## GitHub Issue

Owner

  • Name: 周奇
  • Login: chapzq77
  • Kind: user

GitHub Events

Total
Last Year