https://github.com/chapzq77/chineseglue-1
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, corpus and leaderboard
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.7%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, corpus and leaderboard
Basic Info
- Host: GitHub
- Owner: chapzq77
- Default Branch: master
- Homepage: https://github.com/chineseGLUE
- Size: 10.7 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of fighting41love/ChineseGLUE
Created over 6 years ago
· Last pushed over 6 years ago
https://github.com/chapzq77/ChineseGLUE-1/blob/master/
# ChineseGLUE
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, corpus and leaderboard
()
ChineseGLUE
---------------------------------------------------------------------
Why do we need a benchmark for Chinese lanague understand evaluation?
---------------------------------------------------------------------
14
()
(state of the art)
- Contents
--------------------------------------------------------------------
Language Understanding Evaluation benchmark for Chinese(ChineseGLUE) got ideas from GLUE, which is a collection of
resources for training, evaluating, and analyzing natural language understanding systems. SuperGLUE consists of:
##### 1
A benchmark of several sentence or sentence pair language understanding tasks.
Currently the datasets used in these tasks are come from public. We will include datasets with private test set before
the end of 2019.
##### 2
A public leaderboard for tracking performance. You will able to submit your prediction files on these tasks,
each task will be evaluated and scored, a final score will also be available.
##### 3
baselines for ChineseGLUE tasks. baselines will be available in TensorFlow,PyTorch,Keras and PaddlePaddle.
##### 4
A huge amount of raw corpus for pre-train or language modeling research purpose. It will contains around 10G raw corpus in 2019;
In the first half year of 2020, it will include at least 30G raw corpus; By the end of 2020, we will include enough
raw corpus, such as 100G, so big enough that you will need no more raw corpus for general purpose language modeling.
You can use it for general purpose or domain adaption, or even for text generating. when you use for domain adaption,
you will able to select corpus you are interested in.
--------------------------------------------------------------------
##### 1. LCQMC
0101
(238,766)(8,802)(12,500)
1. [] [] 1
2. [] [] 0
##### 2. XNLI
(392,703)()()
1. , .[] . [] neutral
2. [] [] entailment
XNLI15
##### 3.TNEWS
(266,000)(57,000)(57,000)
6552431613437805063_!_102_!_news_entertainment_!__!_,,,,,
_!_ IDcode
##### 4. Comming soon!
8
#####
wget https://storage.googleapis.com/chineseglue/chineseGLUEdatasets.v0.0.1.zip
-
---------------------------------------------------------------------
TODO
---------------------------------------------------------------------
10Gnlp_chinese_corpus
4M
14G
1: 8G2000
23G3G900
31.1G300
42.3G811ChineseNLPCorpus
chineseGLUE#163.com
ChineseGLUEChineseGLUE
ChineseGLUE
---------------------------------------------------------------------
#####
1
2
3wiki & bookCorpus
4state of the art
#####
chineseGLUE#163.com
---------------------------------------------------------------------
1
2(TensorFlow, PyTorch, Keras)
3bert/bert_wwm_ext/roberta/albert/ernie/ernie2.0ChineseGLUE
4landing
5(ChineseGLUE)
6
Timeline :
---------------------------------------------------------------------
2019-10-20 to 2019-12-31: beta version of ChineseGLUE
2020.1.1 to 2020-12-31: official version of ChineseGLUE
2021.1.1 to 2021-12-31: super version of ChineseGLUE
Contribution
---------------------------------------------------------------------
Share your data set with community or make a contribution today! Just send email to chineseGLUE#163.com,
or join QQ group: 836811304
Reference:
---------------------------------------------------------------------
1GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
2SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
3LCQMC: A Large-scale Chinese Question Matching Corpus
4XNLI: Evaluating Cross-lingual Sentence Representations
5TNES: toutiao-text-classfication-dataset
6nlp_chinese_corpus: Large Scale Chinese Corpus for NLP>
7ChineseNLPCorpus>
Owner
- Name: 周奇
- Login: chapzq77
- Kind: user
- Repositories: 3
- Profile: https://github.com/chapzq77