https://github.com/alixunxing/chineseglue
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (1.4%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Basic Info
- Host: GitHub
- Owner: alixunxing
- Default Branch: master
- Homepage: http://www.CLUEbenchmark.com
- Size: 2.6 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of ChineseGLUE/ChineseGLUE
Created almost 6 years ago
· Last pushed over 6 years ago
https://github.com/alixunxing/ChineseGLUE/blob/master/
# ChineseGLUE Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models, corpus and leaderboard () 20191122 1https://github.com/CLUEbenchmark/CLUE 2 (ChineseGLUE)- Leaderboard --------------------------------------------------------------------- ##### : www.CLUEbenchmark.com #### (vO) | | Score | | TNEWS | LCQMC | XNLI | INEWS | BQ | MSRANER | THUCNEWS | iFLYTEKData | | :----:| :----: | :----: | :----: |:----: |:----: |:----: |:----: |:----: |:----: |:----: | | BERT-base | 84\.57 | 108M | 89\.78 | 86\.9 | 77\.8 | 82\.7 | 85\.08 | 95\.38 | 95\.35 | 63\.57 | | BERT-wwm-ext | 84\.89 | 108M | 89\.81 | ***87\.3*** | 78\.7 | 83\.46 | ***85\.21*** | 95\.26 | 95\.57 | 63\.83 | | ERNIE-base | 84\.63 | 108M | 89\.83 | 87\.2 | 78\.6 | ***85\.14*** | 84\.47 | 95\.17 | 94\.9 | 61\.75 | | RoBERTa-large | 85\.08 | 334M | 89\.91 | 87\.2 | 79\.9 | 84 | 85\.2 | ***96\.07*** | 94\.56 | 63\.8 | | XLNet-mid | 81\.07 | 209M | 86\.26 | 85\.98 | 78\.7 | 84 | 77\.85 | 92\.11 | 94\.54 | 60\.16 | | ALBERT-xlarge | 84\.08 | 59M | 88\.3 | 86\.76 | 74\.0? | 82\.4 | 84\.21 | 89\.51 | 95\.45 | 61\.94 | | ALBERT-tiny | 78\.22 | 1\.8M | 87\.1 | 85\.4 | 68 | 81\.4 | 80\.76 | 84\.77 | 93\.54 | 44\.83 | | RoBERTa-wwm-ext | 84\.55 | 108M | 89\.79 | 86\.33 | 79\.28 | 82\.28 | 84\.02 | 95\.06 | 95\.52 | 64\.18 | | RoBERTa-wwm-large | ***85\.13*** | 330M | ***90\.11*** | 86\.82 | ***80\.04*** | 82\.78 | 84\.9 | 95\.32 | ***95\.93*** | ***65\.19*** | DRCD & CMRC2018:(F1, EM)CHID:(Acc)BQ:(Acc)MSRANER:(F1)iFLYTEK:(Acc) Score1-9 #### | | Score | | DRCD | CMRC2018 | CHID | | :----:| :----: | :----: | :----: |:----: |:----: | | BERT-base | 79.08 | 108M | 85.49 | 69.72 | 82.04 | | BERT-wwm-ext | - | 108M | 87.15 | 73.23 | - | | ERNIE-base | - | 108M | 86.03 | 73.32 | - | | RoBERTa-large | 83.32 | 334M | 89.35 | 76.11 | 84.5 | | XLNet-mid | - | 209M | 83.28 | 66.51 | - | | ALBERT-xlarge | - | 59M | 89.78 | 75.22 | - | | ALBERT-xxlarge | - | - | - | - | - | | ALBERT-tiny | - | 1.8M | 70.08 | 53.68 | - | | RoBERTa-wwm-ext | 81.88 | 108M | 88.12 | 73.89 | 83.62 | | RoBERTa-wwm-large | ***84.22*** | 330M | ***90.70*** | ***76.58*** | ***85.37*** | F1EMEM ChineseGLUE Vision --------------------------------------------------------------------- *** 2019-10-13: ; INEWS *** Why do we need a benchmark for Chinese lanague understand evaluation? --------------------------------------------------------------------- 14 () (state of the art) - Contents -------------------------------------------------------------------- Language Understanding Evaluation benchmark for Chinese(ChineseGLUE) got ideas from GLUE, which is a collection of resources for training, evaluating, and analyzing natural language understanding systems. ChineseGLUE consists of: ##### 1 A benchmark of several sentence or sentence pair language understanding tasks. Currently the datasets used in these tasks are come from public. We will include datasets with private test set before the end of 2019. ##### 2 A public leaderboard for tracking performance. You will able to submit your prediction files on these tasks, each task will be evaluated and scored, a final score will also be available. ##### 3 baselines for ChineseGLUE tasks. baselines will be available in TensorFlow,PyTorch,Keras and PaddlePaddle. ##### 4 A huge amount of raw corpus for pre-train or language modeling research purpose. It will contains around 10G raw corpus in 2019; In the first half year of 2020, it will include at least 30G raw corpus; By the end of 2020, we will include enough raw corpus, such as 100G, so big enough that you will need no more raw corpus for general purpose language modeling. You can use it for general purpose or domain adaption, or even for text generating. when you use for domain adaption, you will able to select corpus you are interested in. Introduction of datasets -------------------------------------------------------------------- ##### 1. LCQMC Semantic Similarity Task 0101 (238,766)(8,802)(12,500) 1. [] [] 1 2. [] [] 0 ##### 2. XNLI Natural Language Inference (392,703)(2,491)(5,011) 1. , .[] . [] neutral 2. [] [] entailment XNLI15 ##### 3.TNEWS Short Text Classificaiton for News (266,000)(57,000)(57,000) 6552431613437805063_!_102_!_news_entertainment_!__!_,,,,, _!_ IDcode ##### 4.INEWS Sentiment Analysis for Internet News (5,356)(1,000)(1,000) 1_!_00005a3efe934a19adc0b69b05faeae7_!__!_370 ...... _!_id ##### 5.DRCD Reading Comprehension for Traditional Chinese Delta Reading Comprehension Dataset (DRCD)(https://github.com/DRCKnowledgeTeam/DRCD) ``` (8,01626,936)(1,0003,524)(1,0003,493) { "version": "1.3", "data": [ { "title": "", "id": "2128", "paragraphs": [ { "context": " ", "id": "2128-2", "qas": [ { "id": "2128-2-1", "question": "?", "answers": [ { "id": "1", "text": "", "answer_start": 92 } ] }, { "id": "2128-2-2", "question": "?", "answers": [ { "id": "1", "text": "", "answer_start": 105 } ] } ] } ] } ] } ``` squad() ##### 6.CMRC2018 Reading Comprehension for Simplified Chinese https://hfl-rc.github.io/cmrc2018/ ``` (2,40310,142)(2561,002)(8483,219) { "version": "1.0", "data": [ { "title": "", "context_id": "TRIAL_0", "context_text": "1278.1117220131995", "qas":[ { "query_id": "TRIAL_0_QUERY_0", "query_text": "", "answers": [ "", "", "" ] }, { "query_id": "TRIAL_0_QUERY_1", "query_text": "12", "answers": [ "78.1", "78.1", "78.1" ] }, { "query_id": "TRIAL_0_QUERY_2", "query_text": "", "answers": [ "", "", "" ] } ] } ] } ``` squad ##### 7. BQ Question Matching for Customer Service 120,0000101 (100,000)(10,000)(10,000) 1. [] [] 0 2. [] [] 1 ##### 8. MSRANER Name Entity Recognition 5nrnsnto (46,364)(4,365) 1./o /o /o /o /o /o /nr /o /o /o /o /o /o /o /o /ns /o /o 2./o /o /o /o /nt /o /o /o /o /o /o ##### 9. THUCNEWS Long Text classification 414: "":0, "":1, "":2, "":3, "":4, "":5, "":6, "":7, "":8, "":9, "":10, "":11, "":12, "":13 (33,437)(4,180)(4,180) 11_!__!_493337.txt_!_A-Touch MK3533MP5:"">1993...... _!_ IDID ##### 10.iFLYTEK Long Text classification 1.7app119"":0,"":1,"WIFI":2,"":3,.,"":115,"":116,"":117,"":118(0-118) ``` (12,133)(2,599)(2,600) 17_!__!_...... _!_ ID ``` ##### 11.CHID Chinese IDiom Dataset for Cloze Test https://arxiv.org/abs/1906.01265 mask ``` (84,709)(3,218)(3,231) { "content": [ # 0 "2210080100#idiom000378#", # 1 "#idiom000379##idiom000380#", # 2 "#idiom000381#2050", # 3 "#idiom000382#60", # 4 "#idiom000383#", # 5 "2009#idiom000384#2010"], "candidates": [ "", "", "", "", "", "", "", "", "", "" ] } ``` ##### 12.CMNLI Chinese Multi-Genre NLI ChineseMNLIMNLIfictiontelephonetravelgovernmentslate ``` train(391,783)matched(9336)mismatched(8,870) {"sentence1": "", "sentence2": "", "gold_label": "neutral"} ``` ##### 13. Comming soon! ##### wget https://storage.googleapis.com/chineseglue/chineseGLUEdatasets.v0.0.1.zip (ChineseGLUE)-- Evaluation of Dataset for Different Models --------------------------------------------------------------------- #### TNEWS Short Text Classificaiton for News (Accuracy) | | dev) | test) | | | :----:| :----: | :----: | :----: | | ALBERT-xlarge | 88.30 | 88.30 |batch_size=32, length=128, epoch=3 | | BERT-base | 89.80 | 89.78 | batch_size=32, length=128, epoch=3 | | BERT-wwm-ext-base | 89.88 | 89.81 | batch_size=32, length=128, epoch=3 | | ERNIE-base | 89.77 |89.83 | batch_size=32, length=128, epoch=3 | | RoBERTa-large | 90.00 | 89.91 | batch_size=16, length=128, epoch=3 | | XLNet-mid |86.14 | 86.26 | batch_size=32, length=128, epoch=3 | | RoBERTa-wwm-ext | 89.82 | 89.79 | batch_size=32, length=128, epoch=3 | | RoBERTa-wwm-large-ext | ***90.05*** | ***90.11*** | batch_size=16, length=128, epoch=3 | #### XNLI Natural Language Inference (Accuracy) | | dev) | test) | | | :----:| :----: | :----: | :----: | | ALBERT-xlarge | 74.0? | 74.0? |batch_size=64, length=128, epoch=2 | | BERT-base | 77.80 | 77.80 | batch_size=64, length=128, epoch=2 | | BERT-wwm-ext-base | 79.4 | 78.7 | batch_size=64, length=128, epoch=2 | | ERNIE-base | 79.7 |78.6 | batch_size=64, length=128, epoch=2 | | RoBERTa-large |***80.2*** |79.9 | batch_size=64, length=128, epoch=2 | | XLNet-mid | 79.2 | 78.7 | batch_size=64, length=128, epoch=2 | | RoBERTa-wwm-ext | 79.56 | 79.28 | batch_size=64, length=128, epoch=2 | | RoBERTa-wwm-large-ext | ***80.20*** | ***80.04*** | batch_size=16, length=128, epoch=2 | ALBERT-xlargeXNLI #### LCQMC Semantic Similarity Task (Accuracy) | | dev) | test) | | | :----:| :----: | :----: | :----: | | ALBERT-xlarge | 89.00 | 86.76 |batch_size=64, length=128, epoch=3 | | BERT-base | 89.4 | 86.9 | batch_size=64, length=128, epoch=3 | | BERT-wwm-ext-base |89.1 | ***87.3*** | batch_size=64, length=128, epoch=3 | | ERNIE-base | 89.8 | 87.2 | batch_size=64, length=128, epoch=3| | RoBERTa-large |***89.9*** | 87.2| batch_size=64, length=128, epoch=3 | | XLNet-mid | 86.14 | 85.98 | batch_size=64, length=128, epoch=3 | | RoBERTa-wwm-ext | 89.08 | 86.33 | batch_size=64, length=128, epoch=3 | | RoBERTa-wwm-large-ext | 89.79 | 86.82 | batch_size=16, length=128, epoch=3 | #### INEWS Sentiment Analysis for Internet News (Accuracy) | | dev) | test) | | | :----:| :----: | :----: | :----: | | ALBERT-xlarge | 81.80 | 82.40 |batch_size=32, length=512, epoch=8 | | BERT-base | 81.29 | 82.70 | batch_size=16, length=512, epoch=3 | | BERT-wwm-ext-base | 81.93 | 83.46 | batch_size=16, length=512, epoch=3 | | ERNIE-base | ***84.50*** |***85.14*** | batch_size=16, length=512, epoch=3 | | RoBERTa-large | 81.90 | 84.00 | batch_size=4, length=512, epoch=3 | | XLNet-mid | 82.00 | 84.00 | batch_size=8, length=512, epoch=3 | | RoBERTa-wwm-ext | 82.98 | 82.28 | batch_size=16, length=512, epoch=3 | | RoBERTa-wwm-large-ext | 83.73 | 82.78 | batch_size=4, length=512, epoch=3 | #### DRCD Reading Comprehension for Traditional Chinese (F1, EM) | | dev) | test) | | | :----:| :----: | :----: | :----: | | BERT-base |F1:92.30 EM:86.60 | F1:91.46 EM:85.49 | batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 | | BERT-wwm-ext-base |F1:93.27 EM:88.00 | F1:92.63 EM:87.15 | batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 | | ERNIE-base |F1:92.78 EM:86.85 | F1:92.01 EM:86.03 | batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 | | ALBERT-large |F1:93.90 EM:88.88 | F1:93.06 EM:87.52 | batch=32, length=512, epoch=3 lr=2e-5 warmup=0.05 | | ALBERT-xlarge |F1:94.63 EM:89.68 | F1:94.70 EM:89.78 | batch_size=32, length=512, epoch=3 lr=2.5e-5 warmup=0.06 | | ALBERT-tiny |F1:81.51 EM:71.61 | F1:80.67 EM:70.08 | batch=32, length=512, epoch=3 lr=2e-4 warmup=0.1 | | RoBERTa-large |F1:94.93 EM:90.11 | F1:94.25 EM:89.35 | batch=32, length=256, epoch=2 lr=3e-5 warmup=0.1| | xlnet-mid |F1:92.08 EM:84.40 | F1:91.44 EM:83.28 | batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 | | RoBERTa-wwm-ext |F1:94.26 EM:89.29 | F1:93.53 EM:88.12 | batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1| | RoBERTa-wwm-large-ext |***F1:95.32 EM:90.54*** | ***F1:95.06 EM:90.70*** | batch=32, length=512, epoch=2 lr=2.5e-5 warmup=0.1 | #### CMRC2018 Reading Comprehension for Simplified Chinese (F1, EM) | | dev) | test) | | | :----:| :----: | :----: | :----: | | BERT-base |F1:85.48 EM:64.77 | F1:87.17 EM:69.72 | batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 | | BERT-wwm-ext-base |F1:86.68 EM:66.96 |F1:88.78 EM:73.23| batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 | | ERNIE-base |F1:87.30 EM:66.89 | F1:89.62 EM:73.32 | batch=32, length=512, epoch=2 lr=3e-5 warmup=0.1 | | ALBERT-large | F1:87.86 EM:67.75 |F1:90.17 EM:73.66| epoch3, batch=32, length=512, lr=2e-5, warmup=0.05 | | ALBERT-xlarge | F1:88.66 EM:68.90 |F1:90.92 EM:75.22| epoch3, batch=32, length=512, lr=2e-5, warmup=0.1 | | ALBERT-tiny | F1:73.95 EM:48.31 |F1:75.73 EM:53.68| epoch3, batch=32, length=512, lr=2e-4, warmup=0.1 | | RoBERTa-large | F1:88.61 EM:69.94 |F1:90.94 EM:76.11| epoch2, batch=32, length=256, lr=3e-5, warmup=0.1 | | xlnet-mid |F1:85.63 EM:65.31 | F1:86.09 EM:66.51 | epoch2, batch=32, length=512, lr=3e-5, warmup=0.1 | | RoBERTa-wwm-ext |F1:87.28 EM:67.89 | F1:89.74 EM:73.89 | epoch2, batch=32, length=512, lr=3e-5, warmup=0.1 | | RoBERTa-wwm-large-ext |***F1:89.42 EM:70.59*** | ***F1:91.56 EM:76.58*** | epoch2, batch=32, length=512, lr=2.5e-5, warmup=0.1 | #### CHID Chinese IDiom Dataset for Cloze Test (Accuracy) | | dev) | test) | | | :----:| :----: | :----: | :----: | | BERT-base | 82.2 | 82.04 | batch=24, length=64, epoch=3 lr=2e-5 | | BERT-wwm-ext-base |- |-| - | | ERNIE-base |- | - | - | | ALBERT-large |- | - | - | | ALBERT-xlarge |- | - | - | | ALBERT-tiny |- | - | - | | RoBERTa-large | 85.31 | 84.5 | batch=24, length=64, epoch=3 lr=2e-5 | | xlnet-mid |- | - | - | | RoBERTa-wwm-ext |83.78 | 83.62 | batch=24, length=64, epoch=3 lr=2e-5 | | RoBERTa-wwm-large-ext |***85.81*** | ***85.37*** | batch=24, length=64, epoch=3 lr=2e-5 | #### CMNLI Chinese Multi-Genre NLI (Accuracy) | | matched | mismatched | | | :----:| :----: | :----: | :----: | | BERT-base | 79.39 | 79.76 | batch=32, length=128, epoch=3 lr=2e-5 | | BERT-wwm-ext-base |81.41 |80.67| batch=32, length=128, epoch=3 lr=2e-5 | | ERNIE-base |79.65 | 80.70 | batch=32, length=128, epoch=3 lr=2e-5 | | ALBERT-xxlarge |- | - | - | | ALBERT-tiny |72.71 | 72.72 | batch=32, length=128, epoch=3 lr=2e-5 | | RoBERTa-large | - | - | - | | xlnet-mid |78.15 |76.93 | batch=16, length=128, epoch=3 lr=2e-5 | | RoBERTa-wwm-ext |81.09 | 81.38 | batch=32, length=128, epoch=3 lr=2e-5 | | RoBERTa-wwm-large-ext |***83.4*** | ***83.42*** | batch=32, length=128, epoch=3 lr=2e-5 | #### BQ Question Matching for Customer Service (Accuracy) | | dev | test | | | :----:| :----: | :----: | :----: | | BERT-base | 85.86 | 85.08 | batch_size=64, length=128, epoch=3 | | BERT-wwm-ext-base | 86.05 | ***85.21*** |batch_size=64, length=128, epoch=3 | | ERNIE-base | 85.92 | 84.47 | batch_size=64, length=128, epoch=3 | | RoBERTa-large | 85.68 | 85.20 | batch_size=8, length=128, epoch=3 | | XLNet-mid | 79.81 | 77.85 | batch_size=32, length=128, epoch=3 | | ALBERT-xlarge | 85.21 | 84.21 | batch_size=16, length=128, epoch=3 | | ALBERT-tiny | 82.04 | 80.76 | batch_size=64, length=128, epoch=5 | | RoBERTa-wwm-ext | 85.31 | 84.02 | batch_size=64, length=128, epoch=3 | | RoBERTa-wwm-large-ext | ***86.34*** | 84.90 | batch_size=16, length=128, epoch=3 | #### MSRANER Name Entity Recognition (F1): | | test | | | :----: | :----: | :----: | | BERT-base | 95.38 | batch_size=16, length=256, epoch=5, lr=2e-5 | | BERT-wwm-ext-base | 95.26 | batch_size=16, length=256, epoch=5, lr=2e-5 | | ERNIE-base | 95.17 | batch_size=16, length=256, epoch=5, lr=2e-5 | | RoBERTa-large | ***96.07*** | batch_size=8, length=256, epoch=5, lr=2e-5 | | XLNet-mid | 92.11 | batch_size=8, length=256, epoch=5, lr=2e-5 | | ALBERT-xlarge | 89.51 | batch_size=16, length=256, epoch=8, lr=7e-5 | | ALBERT-base | 92.47 | batch_size=32, length=256, epoch=8, lr=5e-5 | | ALBERT-tiny | 84.77 | batch_size=32, length=256, epoch=8, lr=5e-5 | | RoBERTa-wwm-ext | 95.06 | batch_size=16, length=256, epoch=5, lr=2e-5 | | RoBERTa-wwm-large-ext | 95.32 | batch_size=8, length=256, epoch=5, lr=2e-5 | #### THUCNEWS Long Text Classification (Accuracy) | | dev) | test) | | | :----:| :----: | :----: | :----: | | ALBERT-xlarge | 95.74 | 95.45 |batch_size=32, length=512, epoch=8 | | ALBERT-tiny | 92.63 | 93.54 | batch_size=64, length=128, epoch=5 | | BERT-base | 95.28 | 95.35 | batch_size=8, length=128, epoch=3 | | BERT-wwm-ext-base | 95.38 | 95.57 | batch_size=8, length=128, epoch=3 | | ERNIE-base | 94.35 | 94.90 | batch_size=16, length=256, epoch=3 | | RoBERTa-large | 94.52 | 94.56 | batch_size=2, length=256, epoch=3 | | XLNet-mid | 94.04 | 94.54 | batch_size=16, length=128, epoch=3 | | RoBERTa-wwm-ext | 95.59 | 95.52 | batch_size=16, length=256, epoch=3 | | RoBERTa-wwm-large-ext | ***96.10*** | ***95.93*** | batch_size=32, length=512, epoch=8 | #### iFLYTEKData Long Text Classification (Accuracy) | | dev) | test) | | | :-------------------: | :----------: | :-----------: | :--------------------------------: | | ALBERT-xlarge | 61.94 | 61.34 | batch_size=32, length=128, epoch=3 | | ALBERT-tiny | 44.83 | 44.62 | batch_size=32, length=256, epoch=3 | | BERT-base | 63.57 | 63.48 | batch_size=32, length=128, epoch=3 | | BERT-wwm-ext-base | 63.83 | 63.75 | batch_size=32, length=128, epoch=3 | | ERNIE-base | 61.75 | 61.80 | batch_size=24, length=256, epoch=3 | | RoBERTa-large | 63.80 | 63.91 | batch_size=32, length=128, epoch=3 | | XLNet-mid | 60.16 | 60.04 | batch_size=16, length=128, epoch=3 | | RoBERTa-wwm-ext | 64.18 | - | batch_size=16, length=128, epoch=3 | | RoBERTa-wwm-large-ext | ***65.19*** | ***65.10*** | batch_size=32, length=128, epoch=3 | - Start Codes for Baselines --------------------------------------------------------------------- Bert BQ chineseGLUE/baselines/models/**bert**/ run_classifier_**bq**.sh ```bash cd chineseGLUE/baselines/models/bert/ sh run_classifier_bq.sh ``` BQ chineseGLUE/baselines/glue/chineseGLUEdatasets/**bq**/ Bert chineseGLUE/baselines/models/bert/prev_trained_model/ - ####Corpus for Langauge Modelling, Pre-training, Generating tasks --------------------------------------------------------------------- 10Gnlp_chinese_corpus 4M 14G 1: 8G2000 23G3G900 31.1G300 42.3G811ChineseNLPCorpus chineseGLUE#163.com ChineseGLUEChineseGLUE ChineseGLUE Members --------------------------------------------------------------------- ##### Benefits 1 2 3wiki & bookCorpus 4state of the art ##### How to join with us chineseGLUE#163.com TODO LIST --------------------------------------------------------------------- 11 (5) 2 3baselises(PyTorchKeras) 4bert/bert_wwm_ext/roberta/albert/ernie/ernie2.0ChineseGLUE XLNet-midLCQMC 5 ##### 6landing 7(ChineseGLUE) 8 Timeline : --------------------------------------------------------------------- 2019-10-20 to 2019-12-31: beta version of ChineseGLUE 2020.1.1 to 2020-12-31: official version of ChineseGLUE 2021.1.1 to 2021-12-31: super version of ChineseGLUE Contribution --------------------------------------------------------------------- Share your data set with community or make a contribution today! Just send email to chineseGLUE#163.com, or join QQ group: 836811304 #### Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC) Reference: --------------------------------------------------------------------- 1GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding 2SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems 3LCQMC: A Large-scale Chinese Question Matching Corpus 4XNLI: Evaluating Cross-lingual Sentence Representations 5TNES: toutiao-text-classfication-dataset 6nlp_chinese_corpus: Large Scale Chinese Corpus for NLP 7ChineseNLPCorpus 8ALBERT: A Lite BERT For Self-Supervised Learning Of Language Representations 9BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 10RoBERTa: A Robustly Optimized BERT Pretraining Approach
Owner
- Login: alixunxing
- Kind: user
- Repositories: 18
- Profile: https://github.com/alixunxing
Corpus for Langauge Modelling, Pre-training, Generating tasks
---------------------------------------------------------------------
10G