https://github.com/artificialzeng/clue

中文语言理解基准测评 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

https://github.com/artificialzeng/clue

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (1.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

中文语言理解基准测评 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of CLUEbenchmark/CLUE
Created almost 6 years ago · Last pushed about 6 years ago

https://github.com/ArtificialZeng/CLUE/blob/master/

# CLUE benchmark 
datasets, baselines, pre-trained models, corpus and leaderboard

()  



(CLUE benchmark)- Leaderboard
---------------------------------------------------------------------
#####             : www.CLUEbenchmarks.com                

#### (v1,)

|    | Score  |     | AFQMC  | TNEWS'  | IFLYTEK'   | CMNLI   | WSC | CSL  |
| :----:| :----: | :----: | :----: |:----: |:----: |:----: |:----: |:----: |
| BERT-base        | 68.77% | 108M |  73.70% | 56.58%  | 60.29% | 79.69% |  62.0% | 80.36% |
| BERT-wwm-ext      | 70.47% | 108M  | 74.07% | 56.84%  | 59.43% | 80.42% | 61.1%  | 80.63% |
| ERNIE-base         | 70.55% | 108M  | 73.83% | 58.33% | 58.96% | 80.29% | 60.8%  | 79.1%      |
| RoBERTa-large      | 72.63% | 334M  | 74.02% | 57.86%  | 62.55% | 81.70% | 72.7%   | 81.36%       |
| XLNet-mid  | 68.65% | 200M | 70.50% | 56.24% | 57.85% | 81.25% |  64.4%   | 81.26%     |
| ALBERT-xxlarge      | 71.04% | 235M   | 75.6%  | **59.46%** | 62.89% | **83.14%** |  61.54%   | **83.63%**  |
| ALBERT-xlarge      | 68.91% | 60M   | 69.96%  | 57.36% | 59.50% | 81.13% |  64.34%   | 81.20%  |
| ALBERT-large      | 67.91% | 18M   | 74%  | 55.16% | 57.00% | 78.77% |  62.24%   | 80.30%  |
| ALBERT-base      | 67.44% | 12M   | 72.55%  | 55.06% | 56.58% | 77.58% |  64.34%   | 78.5%  |
| ALBERT-tiny        | 61.92% | **4M** | 69.92% | 53.35% | 48.71% | 70.61% |  58.5%   | 74.56% |
| RoBERTa-wwm-ext   | 71.72% | 108M  | 74.04% | 56.94% | 60.31% | 80.51% | 67.8% | 81.0% |
| RoBERTa-wwm-large | **73.45%** | 330M | **76.55%** | 58.61% | **62.98%** | 82.12% |  **74.6%** | 82.13% |


    AFQMC:(Acc)TNEWS:(Acc)IFLYTEK:(Acc); CMNLI: ; 
       COPA: ; WSC: Winograd; CSL: ; Score6
      'albert_tiny,albert_tiny.

#### 

|  | Score |  | CMRC2018 | CHID | C3 |
| :----:| :----: | :----: | :----: |:----: |:----: |
| BERT-base	| 72.71 | 108M | 71.60 | 82.04 | 64.50 |
| BERT-wwm-ext | 75.12 | 108M | 73.95 | 82.90 | 68.50 |
| ERNIE-base	| 73.69 | 108M | 74.7 | 82.28 | 64.10 |
| RoBERTa-large | 76.85 | 334M | ***78.50*** | 84.50 | 67.55 |
| XLNet-mid	| 72.70 | 209M | 66.95 | 83.47 | 67.68 |
| ALBERT-base | 68.08 | 10M | 72.90 | 71.77 | 59.58 |
| ALBERT-large | 71.51 | 16.5M | 75.95 | 74.18 | 64.41 |
| ALBERT-xlarge | 75.73 | 57.5M | 76.30 | 80.57 | 70.32 |
| ALBERT-xxlarge | 77.19 | 221M | 75.15 | 83.15 | 73.28 |
| ALBERT-tiny | 49.05 | 1.8M | 53.35 | 43.53 | 50.26 |
| RoBERTa-wwm-ext  | 75.11 | 108M | 75.20 | 83.62 | 66.50 |
| RoBERTa-wwm-large | ***79.05*** | 330M | 77.95 | ***85.37*** | ***73.82*** |

DRCDCMRC2018: (F1, EM)CHID: (Acc)C3: (Acc)Score3

F1EMEMCMRC2018CLUE

. Baseline with codes
---------------------------------------------------------------------
    
    1 
       git clone https://github.com/CLUEbenchmark/CLUE.git
    2
         
           
           cd CLUE/baselines/models/bert
           cd CLUE/baselines/models_pytorch/classifier_pytorch
       
           cd CLUE/baselines/models_pytorch/mrc_pytorch
    3(GPU): 
       bash run_classifier_xxx.sh
        bash run_classifier_iflytek.sh iflytek  
    4tpu()  
        cd CLUE/baselines/models/bert/tpu  
        bash run_classifier_tnews.shtnewsgstpu ip
        
        cd CLUE/baselines/models/roberta/tpu  
        bash run_classifier_tiny.sh,tpu ip  

        
### 
tensorflow 1.12 /cuda 9.0 /cudnn7.0
###  Toolkit



    pip install PyCLUE 
    cd PyCLUE/examples/classifications
    python3 run_clue_task.py

109 PyCLUE toolkit

### 

    : 
        CLUE/baselines/models/bert
        bash run_classifier_xxx.sh predict 
        output_dirjsonxxx_prdict.json

   

    :
         CLUE/baselines/models_pytorch/mrc_pytorch
         test_mrc.py
         run_mrc_xxx.sh
        
 

 Leaderboard
---------------------------------------------------------------------



(CLUECorpus2020)
---------------------------------------------------------------------
Corpus for Langauge Modelling, Pre-training, Generating tasks

14G4000txt50nlp_chinese_corpus

4M

14G

1 news2016zh_corpus: 8G2000  :mzlk

2- webText2019zh_corpus3G3G900 :qvlq

3- wiki2019zh_corpus1.1G300  :rja4

4- comments2019zh_corpus2.3G784547227ChineseNLPCorpus  :5kwk



chineseGLUE#163.com

ChineseGLUEChineseGLUE


CLUE benchmark Vision
---------------------------------------------------------------------



 Introduction of datasets 
--------------------------------------------------------------------

 

##### 1. AFQMC  Ant Financial  Question Matching Corpus
```
     3433443163861
     
     {"sentence1": "", "sentence2": "", "label": "0"}
      12label1 sentence1sentence20
```
   AFQMC'

##### 2.TNEWS'  Short Text Classificaiton for News
15
```
     (53,360)(10,000)(10,000)
     
     {"label": "102", "label_des": "news_entertainment", "sentence": ""}
      ID
```
    TNEWS'

##### 3.IFLYTEK'  Long Text classification
1.7app119"":0,"":1,"WIFI":2,"":3,.,"":115,"":116,"":117,"":118(0-118)
```
    (12,133)(2,599)(2,600)
    
    {"label": "110", "label_des": "", "sentence": "201630,,1.2.3.4.bug"}
     ID
```
    IFLYTEK'

##### 4.CMNLI  Chinese Multi-Genre NLI

CMNLIXNLIMNLIfictiontelephonetravelgovernmentslateMNLIXNLIXNLIdevMNLImatchedCMNLIdevXNLItestMNLImismatchedCMNLItest

```
    train(391,782)dev(12,426)test(13,880)
    
    {"sentence1": "", "sentence2": "", "label": "neutral"}
     12labelneutralentailmentcontradiction
```
 CMNLI



##### 5. CLUEWSC2020: WSC Winograd2020-03-25  

Winograd Scheme ChallengeWSCCLUEWSC





CLUE benchmark



     {"target": 
         {"span2_index": 37, 
         "span1_index": 5, 
         "span1_text": "", 
         "span2_text": ""}, 
     "idx": 261, 
     "label": "false", 
     "text": ""}
     "true"span1_text"false"


- 1244
- 304

  CLUEWSC2020


##### 6. CSL  Keyword Recognition
[(CSL)](https://github.com/P01son6415/chinese-scientific-literature-dataset)
tf-idf-
```
    (20,000)(3,000)(3,000)
     
    {"id": 1, "abst": "FFT3,FFT.,,,.,,FFT.FFT,,3.,FFTFFT,.", "keyword": ["", "FFT", "", "3"], "label": "1"}
     ID
    
```
    CSL

##### 7.CMRC2018  Reading Comprehension for Simplified Chinese
https://hfl-rc.github.io/cmrc2018/
```
(2,40310,142)(2561,002)(8483,219)  

{
  "version": "1.0",
  "data": [
    {
        "title": "",
        "context_id": "TRIAL_0",
        "context_text": "1278.1117220131995",
        "qas":[
                {
                "query_id": "TRIAL_0_QUERY_0",
                "query_text": "",
                "answers": [
                     "",
                     "",
                     ""
                    ]
                },
                {
                "query_id": "TRIAL_0_QUERY_1",
                "query_text": "12",
                "answers": [
                    "78.1",
                    "78.1",
                    "78.1"
                    ]
                },
                {
                "query_id": "TRIAL_0_QUERY_2",
                "query_text": "",
                "answers": [
                    "",
                    "",
                    ""
                    ]
                }
            ]
        }
    ]
}
```

    CMRC2018


##### 8.DRCD  Reading Comprehension for Traditional Chinese
 Delta Reading Comprehension Dataset (DRCD)(https://github.com/DRCKnowledgeTeam/DRCD)    
```
(8,01626,936)(1,0003,524)(1,0003,493)  

{
  "version": "1.3",
  "data": [
    {
      "title": "",
      "id": "2128",
      "paragraphs": [
        {
          "context": " ",
          "id": "2128-2",
          "qas": [
            {
              "id": "2128-2-1",
              "question": "?",
              "answers": [
                {
                  "id": "1",
                  "text": "",
                  "answer_start": 92
                }
              ]
            },
            {
              "id": "2128-2-2",
              "question": "?",
              "answers": [
                {
                  "id": "1",
                  "text": "",
                  "answer_start": 105
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}
```
squad()
    DRCD2018

##### 9.ChID  Chinese IDiom Dataset for Cloze Test
https://arxiv.org/abs/1906.01265  
mask
```
    (84,709)(3,218)(3,231)
    
    {
      "content": [
        # 0
        "2210080100#idiom000378#", 
        # 1
        "#idiom000379##idiom000380#", 
        # 2
        "#idiom000381#2050", 
        # 3
        "#idiom000382#60", 
        # 4
        "#idiom000383#", 
        # 5
        "2009#idiom000384#2010"],
      "candidates": [
        "", 
        "", 
        "", 
        "", 
        "", 
        "", 
        "", 
        "", 
        "", 
        ""
      ]
    }
```

    CHID
   
##### 10.C3  Multiple-Choice Chinese Machine Reading Comprehension  
https://arxiv.org/abs/1904.09679  
d,m  
```
    (11,869)(3,816)(3,892)
    
    [
      [
        "??",
        ""
      ],
      [
       {
        "question": "?",
        "choice": [
         "",
         "",
         "",
         ""
        ],
        "answer": ""
       }
      ],
    "25-35"
    ],
    [
      [
       "?",
       ""
      ],
      [
       {
        "question": "?",
        "choice": [
         "",
         "",
         ""
        ],
        "answer": ""
       }
      ],
    "31-109"
    ]
```
    C3

##### 11.  CLUE_diagnostics test_set

9

CMNLICMNLI

diagnostics

##### Comming soon!
 ChineseGLUE#163.com

#####  

 Comining Soon

wget 

Data filter method

## 

**k**v0v1

```

1.AlbertTiny
2.k1
3.k
4.k
5.2-4
```

Notes

```
1.k4-6
2.
```

 Contents
--------------------------------------------------------------------
Language Understanding Evaluation benchmark for Chinese(ChineseGLUE) got ideas from GLUE, which is a collection of 

resources for training, evaluating, and analyzing natural language understanding systems. ChineseGLUE consists of: 

##### 1 

A benchmark of several sentence or sentence pair language understanding tasks. 
Currently the datasets used in these tasks are come from public. We will include datasets with private test set before the end of 2019.

##### 2 Leaderboard 

A public leaderboard for tracking performance. You will able to submit your prediction files on these tasks, each task will be evaluated and scored, a final score will also be available.

##### 3  Baselines with code

baselines for ChineseGLUE tasks. baselines will be available in TensorFlow,PyTorch,Keras and PaddlePaddle.

##### 4  Corpus

A huge amount of raw corpus for pre-train or language modeling research purpose. It will contains around 10G raw corpus in 2019; 

In the first half year of 2020, it will include at least 30G raw corpus; By the end of 2020, we will include enough raw corpus, such as 100G, so big enough that you will need no more raw corpus for general purpose language modeling.
You can use it for general purpose or domain adaption, or even for text generating. when you use for domain adaption, you will able to select corpus you are interested in.

##### 5 toolkit

An easy to use toolkit that can run specific task or model with one line of code. You can easily change configuration, task or model.

##### 6) 

Techical report with details

Why do we need a benchmark for Chinese lanague understand evaluation?

 
---------------------------------------------------------------------


    14
    ()



     



     (state of the art)
     



     



---------------------------------------------------------------------
 Evaluation of Dataset for Different Models

#### AFQMC  Ant Semantic Similarity (Accuracy)
|                   | dev) | test) |                            |
| :-------------------: | :----------: | :-----------: | :--------------------------------: |
|     ALBERT-xxlarge     |    -     |     -   |  -  |
|      ALBERT-tiny      |    69.13%     |    69.92%    | batch_size=16, length=128, epoch=3 lr=2e-5|
|       BERT-base       |    74.16%     |     73.70%   | batch_size=16, length=128, epoch=3 lr=2e-5|
|   BERT-wwm-ext-base   |    73.74%     |     74.07%   | batch_size=16, length=128, epoch=3 lr=2e-5|
|      ERNIE-base       |        74.88% |    73.83%    | batch_size=16, length=128, epoch=3 lr=2e-5|
|     RoBERTa-large     |     73.32%    |     74.02%   | batch_size=16, length=128, epoch=3 lr=2e-5|
|       XLNet-mid       |     70.73%    |   70.50%     | batch_size=16, length=128, epoch=3 lr=2e-5|
|    RoBERTa-wwm-ext    |   74.30%      |    74.04%    | batch_size=16, length=128, epoch=3 lr=2e-5|
| RoBERTa-wwm-large-ext |     74.92%    |    76.55%    | batch_size=16, length=128, epoch=3 lr=2e-5|

#### TNEWS'  Toutiao News Classification (Accuracy)
|                   | dev) | test) |                            |
| :-------------------: | :----------: | :-----------: | :--------------------------------: |
|     ALBERT-xxlarge     |    -     |     -    |     -  |
|      ALBERT-tiny      |    53.55%     |       53.35%   | batch_size=16, length=128, epoch=3 lr=2e-5|
|       BERT-base       |    56.09%     |     56.58%    | batch_size=16, length=128, epoch=3 lr=2e-5|
|   BERT-wwm-ext-base   |     56.77%    |    56.86%      | batch_size=16, length=128, epoch=3 lr=2e-5|
|      ERNIE-base       |     58.24%    |     58.33%     | batch_size=16, length=128, epoch=3 lr=2e-5|
|     RoBERTa-large     |     57.95%    |      57.84%    | batch_size=16, length=128, epoch=3 lr=2e-5|
|       XLNet-mid       |    56.09%     |      56.24%    | batch_size=16, length=128, epoch=3 lr=2e-5|
|    RoBERTa-wwm-ext    |   57.51%      |      56.94%       | batch_size=16, length=128, epoch=3 lr=2e-5|
| RoBERTa-wwm-large-ext |  58.32% | 58.61%  | batch_size=16, length=128, epoch=3 lr=2e-5|

#### IFLYTEK'  Long Text Classification (Accuracy)
|                   | dev) | test) |                            |
| :-------------------: | :----------: | :-----------: | :--------------------------------: |
|     ALBERT-xlarge     |    -     |     -     | batch=32, length=128, epoch=3 lr=2e-5 |
|      ALBERT-tiny      |    48.76    |     48.71     | batch=32, length=128, epoch=10 lr=2e-5 |
|       BERT-base       |    60.37    |     60.29     | batch=32, length=128, epoch=3 lr=2e-5 |
|   BERT-wwm-ext-base   |    59.88    |     59.43     | batch=32, length=128, epoch=3 lr=2e-5 |
|      ERNIE-base       |    59.52    |     58.96     | batch=32, length=128, epoch=3 lr=2e-5  |
|     RoBERTa-large     |    62.6    |     62.55     | batch=24, length=128, epoch=3 lr=2e-5  |
|       XLNet-mid       |    57.72    |     57.85     | batch=32, length=128, epoch=3 lr=2e-5  |
|    RoBERTa-wwm-ext    |    60.8    |       60.31       | batch=32, length=128, epoch=3 lr=2e-5  |
| RoBERTa-wwm-large-ext | **62.75** |  **62.98**  | batch=24, length=128, epoch=3 lr=2e-5 |

#### CMNLI  Chinese Multi-Genre NLI (Accuracy)
|  |  (dev %) | test %) |   |
| :----:| :----: | :----: | :----: |
| BERT-base	| 79.47 | 79.69 | batch=64, length=128, epoch=2 lr=3e-5 |
| BERT-wwm-ext-base	| 80.92 |80.42|	batch=64, length=128, epoch=2 lr=3e-5 |
| ERNIE-base	| 80.37 | 80.29 | batch=64, length=128, epoch=2 lr=3e-5 |
| ALBERT-xxlarge	|- | - | - |
| ALBERT-tiny	| 70.26 | 70.61 | batch=64, length=128, epoch=2 lr=3e-5 |
| RoBERTa-large	| 82.40 | 81.70 | batch=64, length=128, epoch=2 lr=3e-5 |
| xlnet-mid	| 82.21 | 81.25 | batch=64, length=128, epoch=2 lr=3e-5 |
| RoBERTa-wwm-ext	| 80.70 | 80.51 | batch=64, length=128, epoch=2 lr=3e-5  |
| RoBERTa-wwm-large-ext	|***83.20*** | ***82.12*** | batch=64, length=128, epoch=2 lr=3e-5  |

ALBERT-xlargeXNLI

#### WSC Winograd  The Winograd Schema Challenge,Chinese Version
|  | dev) | test) |  |
| :----:| :----: | :----: | :----: |
| ALBERT-xxlarge |  -  |  -  |  -    |
| ALBERT-tiny |  57.7(52.9)  |  58.5(52.1)  | lr=1e-4, batch_size=8, length=128, epoch=50   |
| BERT-base | 59.656.7)  | 62.057.9  |  lr=2e-5, batch_size=8, length=128, epoch=50 |
| BERT-wwm-ext-base | 59.4(56.7) |  61.1(56.2) | lr=2e-5, batch_size=8, length=128, epoch=50   |
| ERNIE-base  | 58.1(54.9)| 60.8(55.9) | lr=2e-5, batch_size=8, length=128, epoch=50   |
| RoBERTa-large | 68.6(58.7)  | 72.7(63.6)  | lr=2e-5, batch_size=8, length=128, epoch=50   |
| XLNet-mid | 60.9(56.8  |  64.4(57.3 | lr=2e-5, batch_size=8, length=128, epoch=50   |
| RoBERTa-wwm-ext | 67.2(57.7)  | 67.8(63.5)  | lr=2e-5, batch_size=8, length=128, epoch=50   |
| RoBERTa-wwm-large-ext |69.7(64.5) |  74.6(69.4) | lr=2e-5, batch_size=8, length=128, epoch=50   |

#### CSL   Keyword Recognition (Accuracy)

|                   | dev) | test) |                            |
| :-------------------: | :----------: | :-----------: | :--------------------------------: |
|     ALBERT-xlarge     |    80.23     |     80.29     | batch_size=16, length=128, epoch=2, lr=5e-6  |
|     ALBERT-tiny       |    74.36     |     74.56     | batch_size=4, length=256, epoch=5, lr=1e-5 |
|       BERT-base       |    79.63     |     80.23     | batch_size=4, length=256, epoch=5, lr=1e-5 |
|   BERT-wwm-ext-base   |    80.60     |     81.00     | batch_size=4, length=256, epoch=5, lr=1e-5 |
|      ERNIE-base       |    79.43     |     79.10     | batch_size=4, length=256, epoch=5, lr=1e-5 |
|     RoBERTa-large     |    81.87     |     81.36     | batch_size=4, length=256, epoch=5, lr=5e-6 |
|       XLNet-mid       |    82.06     |     81.26     | batch_size=4, length=256, epoch=3, lr=1e-5 |
|    RoBERTa-wwm-ext    |    80.67     |     80.63     | batch_size=4, length=256, epoch=5, lr=1e-5 |
| RoBERTa-wwm-large-ext |    82.17     |     82.13     | batch_size=4, length=256, epoch=5, lr=1e-5 |

#### DRCD  Reading Comprehension for Traditional Chinese (F1, EM)
|  | dev) | test) |  |
| :----:| :----: | :----: | :----: |
| BERT-base |F1:92.30 EM:86.60 | F1:91.46 EM:85.49 |  batch=32, length=512, epoch=2, lr=3e-5, warmup=0.1 |
| BERT-wwm-ext-base |F1:93.27 EM:88.00 | F1:92.63 EM:87.15 |  batch=32, length=512, epoch=2, lr=3e-5, warmup=0.1 |
| ERNIE-base  |F1:92.78 EM:86.85 | F1:92.01 EM:86.03 |  batch=32, length=512, epoch=2, lr=3e-5, warmup=0.1 |
| ALBERT-large  |F1:93.90 EM:88.88 | F1:93.06 EM:87.52 |  batch=32, length=512, epoch=3, lr=2e-5, warmup=0.05 |
| ALBERT-xlarge |F1:94.63 EM:89.68 | F1:94.70 EM:89.78 |  batch_size=32, length=512, epoch=3, lr=2.5e-5, warmup=0.06 |
| ALBERT-xxlarge |F1:93.69 EM:89.97 | F1:94.62 EM:89.67 |  batch_size=32, length=512, epoch=2, lr=3e-5, warmup=0.1 |
| ALBERT-tiny |F1:81.51 EM:71.61 | F1:80.67 EM:70.08 |  batch=32, length=512, epoch=3, lr=2e-4, warmup=0.1 |
| RoBERTa-large |F1:94.93 EM:90.11 | F1:94.25 EM:89.35 |  batch=32, length=256, epoch=2, lr=3e-5, warmup=0.1|
| xlnet-mid |F1:92.08 EM:84.40 | F1:91.44 EM:83.28 | batch=32, length=512, epoch=2, lr=3e-5, warmup=0.1 |
| RoBERTa-wwm-ext |F1:94.26 EM:89.29 | F1:93.53 EM:88.12 |  batch=32, length=512, epoch=2, lr=3e-5, warmup=0.1|
| RoBERTa-wwm-large-ext |***F1:95.32 EM:90.54*** | ***F1:95.06 EM:90.70*** | batch=32, length=512, epoch=2, lr=2.5e-5, warmup=0.1 |

#### CMRC2018  Reading Comprehension for Simplified Chinese (F1, EM)
|  | dev) | test) |   |
| :----:| :----: | :----: | :----: |
| BERT-base	|F1:85.48 EM:64.77 | F1:88.10 EM:71.60 | batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 |
| BERT-wwm-ext-base	|F1:86.68 EM:66.96 |F1:89.62 EM:73.95|	batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 |
| ERNIE-base	|F1:87.30 EM:66.89 | F1:90.57 EM:74.70 | batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 |
| ALBERT-base	| F1:85.86 EM:64.76 |F1:89.66 EM:72.90| batch=32, epoch2, length=512, lr=3e-5, warmup=0.1 |
| ALBERT-large	| F1:87.36 EM:67.31 |F1:90.81 EM:75.95| batch=32, epoch2, length=512, lr=3e-5, warmup=0.1 |
| ALBERT-xlarge	| F1:88.99 EM:69.08 |F1:92.09 EM:76.30| batch=32, epoch2, length=512, lr=3e-5, warmup=0.1 |
| ALBERT-xxlarge	| F1:87.47 EM:66.43 |F1:90.77 EM:75.15| batch=32, epoch2, length=512, lr=3e-5, warmup=0.1 |
| ALBERT-tiny	| F1:73.95 EM:48.31 |F1:76.21 EM:53.35| batch=32, epoch3, length=512, lr=2e-4, warmup=0.1 |
| RoBERTa-large	| F1:88.61 EM:69.94 |***F1:92.04 EM:78.50***| batch=32, epoch2, length=256, lr=3e-5, warmup=0.1 |
| xlnet-mid	|F1:85.63 EM:65.31 | F1:86.11 EM:66.95 | batch=32, epoch2, length=512, lr=3e-5, warmup=0.1 |
| RoBERTa-wwm-ext	|F1:87.28 EM:67.89 | F1:90.41 EM:75.20 | batch=32, epoch2, length=512, lr=3e-5, warmup=0.1 |
| RoBERTa-wwm-large-ext	|***F1:89.42 EM:70.59*** | F1:92.11 EM:77.95 | batch=32, epoch2, length=512, lr=2.5e-5, warmup=0.1 |

: cmrc20182kcmrc2018cmrc2018cmrc2018(https://worksheets.codalab.org/worksheets/0x96f61ee5e9914aee8b54bd11e66ec647)

#### CHID  Chinese IDiom Dataset for Cloze Test (Accuracy)
|  | dev) | test) |   |
| :----:| :----: | :----: | :----: |
| BERT-base	|82.20 | 82.04 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| BERT-wwm-ext-base	|83.36 |82.9 |	batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| ERNIE-base	|82.46 | 82.28 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| ALBERT-base	| 70.99 |71.77 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| ALBERT-large	| 75.10 |74.18 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| ALBERT-xlarge	| 81.20 | 80.57 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| ALBERT-xxlarge | 83.61 | 83.15 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| ALBERT-tiny	| 43.47 |43.53 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| RoBERTa-large	| 85.31 |84.50 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| xlnet-mid	|83.76 | 83.47 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| RoBERTa-wwm-ext	|83.78 | 83.62 | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |
| RoBERTa-wwm-large-ext	|***85.81*** | ***85.37*** | batch=24, length=64, epoch=3, lr=2e-5, warmup=0.06 |

#### C3   Multiple-Choice Chinese Machine Reading Comprehension (Accuracy)
|  | dev) | test) |   |
| :----:| :----: | :----: | :----: |
| BERT-base	| 65.70 | 64.50 | batch=24, length=512, epoch=8, lr=2e-5, warmup=0.1 |
| BERT-wwm-ext-base	| 67.80 | 68.50 | batch=24, length=512, epoch=8, lr=2e-5, warmup=0.1 |
| ERNIE-base	| 65.50 | 64.10 | batch=24, length=512, epoch=8, lr=2e-5, warmup=0.1 |
| ALBERT-base | 60.43 | 59.58 | batch=24, length=512, epoch=8, lr=2e-5, warmup=0.1 |
| ALBERT-large | 64.07 | 64.41 | batch=24, length=512, epoch=8, lr=2e-5, warmup=0.1 |
| ALBERT-xlarge | 69.75 | 70.32 | batch=24, length=512, epoch=8, lr=2e-5, warmup=0.1 |
| ALBERT-xxlarge | 73.66 | 73.28 | batch=16, length=512, epoch=8, lr=2e-5, warmup=0.1 |
| ALBERT-tiny	| 50.58 | 50.26 | batch=32, length=512, epoch=8, lr=5e-5, warmup=0.1 |
| RoBERTa-large	| 67.79 | 67.55 | batch=24, length=256, epoch=8, lr=2e-5, warmup=0.1 |
| xlnet-mid	| 66.17 | 67.68 | batch=24, length=512, epoch=8, lr=2e-5, warmup=0.1 |
| RoBERTa-wwm-ext	| 67.06 | 66.50 | batch=24, length=512, epoch=8, lr=2e-5, warmup=0.1 |
| RoBERTa-wwm-large-ext	|***74.48*** | ***73.82*** | batch=16, length=512, epoch=8, lr=2e-5, warmup=0.1 |


ChineseGLUE Members
---------------------------------------------------------------------
#####  Benefits

1

2

3wiki & bookCorpus

4state of the art

#####  How to join with us

 chineseGLUE#163.com

 TODO LIST
---------------------------------------------------------------------
11 (5)

2

3baselises(PyTorchKeras)

4bert/bert_wwm_ext/roberta/albert/ernie/ernie2.0ChineseGLUE

    XLNet-midLCQMC

5

##### 
6landing

7(ChineseGLUE)

8

Timeline :
---------------------------------------------------------------------
2019-10-20 to 2019-12-31: beta version of ChineseGLUE

2020.1.1 to 2020-12-31: official version of ChineseGLUE

2021.1.1 to 2021-12-31: super version of ChineseGLUE

Contribution 
---------------------------------------------------------------------

Share your data set with community or make a contribution today! Just send email to chineseGLUE#163.com, 

or join QQ group: 836811304


#### Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC)

Cite Us:
---------------------------------------------------------------------

    @article{CLUEbenchmark,
      title={CLUE: A Chinese Language Understanding Evaluation Benchmark},
      author={Liang Xu, Xuanwei Zhang, Lu Li, Hai Hu, Chenjie Cao, Weitang Liu, Junyi Li, Yudong Li, Kai Sun, Yechen Xu, Yiming Cui, Cong Yu, Qianqian Dong, Yin Tian, Dian Yu, Bo Shi, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Zhenzhong Lan},
      journal={arXiv preprint arXiv:2004.05986},
      year={2020}
     }

Reference:
---------------------------------------------------------------------
1GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

2SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

4XNLI: Evaluating Cross-lingual Sentence Representations

5TNES: toutiao-text-classfication-dataset

6nlp_chinese_corpus:  Large Scale Chinese Corpus for NLP

7ChineseNLPCorpus

8ALBERT: A Lite BERT For Self-Supervised Learning Of Language Representations

9BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

10RoBERTa: A Robustly Optimized BERT Pretraining Approach

Owner

  • Name: Dr. Artificial曾小健
  • Login: ArtificialZeng
  • Kind: user
  • Location: Beijing

LLM practitioner/engineer, AI/ML/DL Quant

GitHub Events

Total
Last Year