https://github.com/alixunxing/cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, springer.com, mdpi.com, ieee.org, acm.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (2.2%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
搜索所有中文NLP数据集,附常用英文NLP数据集
Basic Info
- Host: GitHub
- Owner: alixunxing
- Default Branch: master
- Homepage: https://www.cluebenchmarks.com/dataSet_search.html
- Size: 8.6 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of CLUEbenchmark/CLUEDatasetSearch
Created almost 6 years ago
· Last pushed over 6 years ago
https://github.com/alixunxing/CLUEDatasetSearch/blob/master/
# CLUEDatasetSearch NLP[](https://www.cluebenchmarks.com/dataSet_search.html)  - [NER](#ner) - [QA](#qa) - [](#) - [](#) - [](#) - [](#) - [](#) - [](#) - [](#) - [](#) - [](#) issue # NER | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | --------- | -------------------------------------- | ---- | ------------------------------------------------------------ | ------------ | ------------ | ----------------------------------------------------- | ---- | | 1 | [CCKS2017](https://biendata.com/competition/CCKS2017_2/data/) | 20175 | | | 800 | | | \ | | | 2 | [CCKS2018](https://biendata.com/competition/CCKS2018_1/data/) | 2018 | | | CCKS2018600 | | | \ | | | 3 | [MSRA](https://github.com/lemonhu/NER-BERT-pytorch/tree/master/data/msra) | \ | MSRA | | MSRABIO46365 | Msra | | \ | | | 4 | [1998](https://github.com/ThunderingII/nlp_ner/tree/master/data) | 19981 | | | 98BIO23061 | 98 | | \ | | | 5 | [Boson](https://github.com/TomatoTang/BILSTM-CRF) | \ | | | BosonBMEO,2000 | Boson | | \ | | | 6 | [CLUE Fine-Grain NER](https://storage.googleapis.com/cluebenchmark/tasks/cluener_public.zip) | 2020 | CLUE | | CLUENER2020THUCTCSina News RSS10107481343 | CULE | | \ | | | 7 | [CoNLL-2003](https://www.clips.uantwerpen.be/conll2003/ner/) | 2003 | CNTS - Language Technology Group | | CoNLL-2003PER, LOC, ORGMISC | CoNLL-2003 | | [](https://www.aclweb.org/anthology/W03-0419.pdf) | | | 8 | [](https://github.com/hltcoe/golden-horse) | 2015 | https://github.com/hltcoe/golden-horse | | | EMNLP-2015 | | | | | 9 | [SIGHAN Bakeoff 2005](http://sighan.cs.uchicago.edu/bakeoff2005/) | 2005 | MSR/PKU | | | bakeoff-2005 | | | | # QA | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | --------- | ------------ | ---- | ------------------------------------------------------------ | ------ | ---- | ------------------------------------------------------------ | ---- | | 1 | [NewsQA](https://github.com/Maluuba/newsqa) | 2019/9/13 | | | Maluuba NewsQA12000120,00061623 | | QA | [](https://arxiv.org/abs/1611.09830) | | | 2 | [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) | | | | SQuAD | | QA | [](https://arxiv.org/abs/1606.05250) | | | 3 | [SimpleQuestions](https://www.dropbox.com/s/tohrsllcfy7rch4/SimpleQuestions_v2.tgz) | | Facebook | | , 100K | | QA | [](https://arxiv.org/pdf/1506.02075v1.pdf) | | | 4 | [WikiQA](https://www.microsoft.com/en-us/download/details.aspx?id=52419&from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fdownloads%2F4495da01-db8c-4041-a7f6-7984a4f6a905%2Fdefault.aspx) | 2016/7/14 | | | WikiQABing3047292581473 | | QA | [](https://www.microsoft.com/en-us/research/publication/wikiqa-a-challenge-dataset-for-open-domain-question-answering/?from=http%3A%2F%2Fresearch.microsoft.com%2Fpubs%2F252176%2Fyangyihmeek_emnlp-15_wikiqa.pdf) | | | 5 | [cMedQA](https://github.com/zhangsheng93/cMedQA) | 2019/2/25 | Zhang Sheng | | 5.410 | | QA | [](https://www.mdpi.com/2076-3417/7/8/767) | | | 6 | [cMedQA2](https://github.com/zhangsheng93/cMedQA2) | 2019/1/9 | Zhang Sheng | | cMedQA1020 | | QA | [](https://ieeexplore.ieee.org/abstract/document/8548603) | | | 7 | [webMedQA](https://github.com/hejunqing/webMedQA) | 2019/3/10 | He Junqing | | 631 | | QA | [](https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-019-0761-8) | | | 8 | [XQA](https://github.com/thunlp/XQA) | 2019/7/29 | | | 9 | | QA | [](https://www.aclweb.org/anthology/P19-1227) | | | 9 | [AmazonQA](https://github.com/amazonqa/amazonqa) | 2019/9/29 | | | QAQA | | QA | [](https://arxiv.org/pdf/1908.04364v1.pdf) | | # | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | -------- | ------------------------------------ | ---- | ------------------------------------------------------------ | ---------------------------- | -------- | ------------------------------------------------------------ | ---- | | 1 | [NLPCC2013](http://tcci.ccf.org.cn/conference/2013/pages/page04_tdata.html) | 2013 | CCF | \ | 7 emotions: like, disgust, happiness, sadness, anger, surprise, fear14 000 , 45 431 | NLPCC2013, Emotion | | [](http://jcip.cipsc.org.cn/CN/article/downloadArticleFile.do?attachType=PDF&id=143) | | | 2 | [NLPCC2014 Task1](http://tcci.ccf.org.cn/conference/2014/pages/page04_ans.html) | 2014 | CCF | \ | 7 emotions: like, disgust, happiness, sadness, anger, surprise, fear 20000 | NLPCC2014, Emotion | | \ | | | 3 | [NLPCC2014 Task2](http://tcci.ccf.org.cn/conference/2014/pages/page04_tdata.html) | 2014 | CCF | \ | | NLPCC2014, Sentiment | | \ | | | 4 | [Weibo Emotion Corpus](https://github.com/MingleiLI/emotion_corpus_weibo) | 2016 | The Hong Kong Polytechnic University | \ | 7 emotions: like, disgust, happiness, sadness, anger, surprise, fear | weibo emotion corpus | | [Emotion Corpus Construction Based on Selection from Noisy Natural Labels](http://www.lrec-conf.org/proceedings/lrec2016/pdf/515_Paper.pdf) | | | 5 | [RenCECPs](Fuji Ren can be contacted (ren@is.tokushima-u.ac.jp) for a license agreement.) | 2009 | Fuji Ren | \ | emotionsentiment15001100035000 | RenCECPs, emotion, sentiment | | [Construction of a blog emotion corpus for Chinese emotional expression analysis](https://dl.acm.org/doi/10.5555/1699648.1699691) | | | 6 | [weibo_senti_100k](https://github.com/SophonPlus/ChineseNlpCorpus/blob/master/datasets/weibo_senti_100k/intro.ipynb) | | | \ | 5 | weibo senti, sentiment | | \ | | | 7 | [BDCI2018-](https://www.datafountain.cn/competitions/310/datasets) | 2018 | CCF | | 301-1 | | | \ | | | 8 | [AI Challenger ](https://blog.csdn.net/linxid/article/details/82764682) | 2o18 | | \ | 620 | | | \ | | | 9 | [BDCI2019](https://www.datafountain.cn/competitions/353) | 2019 | | \ | | | | \ | | | 10 | [](https://zhejianglab.aliyun.com/entrance/231731/introduction?spm=5176.12281949.1003.3.2b58c341YnOFck) | 2019 | | \ | {}4 | | | \ | | | 11 | [2019](https://biendata.com/competition/sohu2019/) | 2019 | | \ | | | | \ | | # | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | ------------- | -------------------------------------------------------- | ---------------- | ------------------------------------------------------------ | ---------------- | -------- | -------- | ---- | | 1 | [2018](https://www.pkbigdata.com/common/cmpt/ _.html) | 20187 | | | idarticleword_segclass19102275 | | | \ | | | 2 | [](https://github.com/skdjfla/toutiao-text-classfication-dataset) | 20185 | | | 15382688 | | | \ | | | 3 | [THUCNews]([http://thuctc.thunlp.org/#%E4%B8%AD%E6%96%87%E6%96%87%E6%9C%AC%E5%88%86%E7%B1%BB%E6%95%B0%E6%8D%AE%E9%9B%86THUCNews](http://thuctc.thunlp.org/#THUCNews)) | 2016 | | | THUCNewsRSS2005~2011UTF-814742.19 GB | | | \ | | | 4 | [](https://www.kesci.com/home/dataset/5d3a9c86cf76a600360edd04) | \ | | | 209804 | | | \ | | | 5 | [](https://www.kesci.com/home/dataset/5dd645fca0cb22002c94e65d/files) | 201912 | chenfengshf | CC0 | Kesci(length<50)1538w | | | \ | | | 6 | [2017 ](https://biendata.com/competition/zhihu/) | 20176 | ; | | 1 1999 300 | | | \ | | | 7 | [2019-](https://zhejianglab.aliyun.com/entrance/231731/information) | 20198 | | | {} | | | \ | | | 8 | [IFLYTEK' ](https://storage.googleapis.com/cluebenchmark/tasks/iflytek_public.zip) | \ | | | 1.7app119 | | | \ | | | 9 | [(SogouCA)](http://www.sogou.com/labs/resource/ca.php) | 2012816 | | | 20126718 | | | \ | | | 10 | [(SogouCS)](http://www.sogou.com/labs/resource/cs.php) | 20128 | | | 20126718 | | | \ | | | 11 | [](http://www.nlpir.org/?action-viewnews-itemid-145) | 201711 | | | | | | | | | 12 | [ChnSentiCorp_htl_all](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets) | 20183 | https://github.com/SophonPlus/ChineseNlpCorpus | | 7000 5000 2000 | | | | | | 13 | [waimai_10k](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets) | 20183 | https://github.com/SophonPlus/ChineseNlpCorpus | | 4000 8000 | | | | | | 14 | [online_shopping_10_cats](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets) | 20183 | https://github.com/SophonPlus/ChineseNlpCorpus | | 10 6 3 | | | | | | 15 | [weibo_senti_100k](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets) | 20183 | https://github.com/SophonPlus/ChineseNlpCorpus | | 10 5 | | | | | | 16 | [simplifyweibo_4_moods](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets) | 20183 | https://github.com/SophonPlus/ChineseNlpCorpus | | 36 4 20 5 | | | | | | 17 | [dmsc_v2](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets) | 20183 | https://github.com/SophonPlus/ChineseNlpCorpus | | 28 70 200 / | | | | | | 18 | [yf_dianping](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets) | 20183 | https://github.com/SophonPlus/ChineseNlpCorpus | | 24 54 440 / | | | | | | 19 | [yf_amazon](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets) | 20183 | https://github.com/SophonPlus/ChineseNlpCorpus | | 52 1100 142 720 / | | | | | # | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | -------------- | --------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------- | ------------------------------------------------------------ | ---- | | 1 | [LCQMC](http://icrc.hitsz.edu.cn/Article/show/171.html) | 2018/6/6 | () | Creative Commons Attribution 4.0 International License | 26006810238766880212500 | | | [](https://www.aclweb.org/anthology/C18-1166) | | | 2 | [The BQ Corpus](http://icrc.hitsz.edu.cn/Article/show/175.html) | 2018/9/4 | () | | 1200001:1 | | | [](https://www.aclweb.org/anthology/D18-1536/) | | | 3 | [AFQMC ](https://dc.cloud.alipay.com/index?click_from=MAIL&_bdType=acafbbbiahdahhadhiih#/topic/intro?id=3) | 2018/4/25 | | | 10 | | | | | | 4 | [](https://ai.ppdai.com/mirror/goToMirrorDetail?mirrorId=1) | 2018/6/10 | | | train.csv3label12101q12q2question.csv | | | | | | 5 | [CAIL2019](https://github.com/china-ai-law-challenge/CAIL2019/tree/master/scm) | 2019/6 | | | (A,B,C)A,B,CABABsim(A,B)>sim(A,C) | | | | | | 6 | [CCKS 2018 ](https://biendata.com/competition/CCKS2018_3/data/) | 2018/4/5 | () | | | | | | | | 7 | [ChineseTextualInference](https://github.com/liuhuanyong/ChineseTextualInference) | 2018/12/15 | | | ,88, | NLI | | | | | 8 | [NLPCC-DBQA](https://biendata.com/ccf_tcci2018/datasets/tcci_tag/11) | 2016/2017/2018 | NLPCC | | -10 | DBQA | | | | | 9 | [](https://www.datafountain.cn/competitions/359) | 201/8/32 | CCF | | | | | | | | 10 | [CNSD / CLUE-CMNLI](https://github.com/zengjunjun/CNSD) | 2019/12 | ZengJunjun | | | NLI | | [](https://6a75-junzeng-uxxxm-1300734931.tcb.qcloud.la/CNSD.pdf?sign=401485f4d6f256393a264e68464ca4ae&t=1578114336) | | | 11 | [cMedQA v1.0](https://github.com/zhangsheng93/cMedQA) | 2017/4/5 | | | 50,00094,134120212 2,0003774117212 2,0003835119211 54,000101,743119212 | | | [](https://www.mdpi.com/2076-3417/7/8/767) | | | 12 | [cMedQA2](https://github.com/zhangsheng93/cMedQA2) | 2018/11/8 | | | 100,000188,49048101 4,000752749101 4,000755249100 108,000203,56949101 | | | [](https://www.mdpi.com/2076-3417/7/8/767) | | | 13 | [ChineseSTS](https://github.com/IAdmireu/ChineseSTS) | 2017/9/21 | , , . | | 12747 | | | | | | 14 | [ ](https://biendata.com/competition/chip2018/) | 2018 | CHIP 2018-CHIP | | 20000 10000label> | | | | | | 15 | [COS960: A Chinese Word Similarity Dataset of 960 Word Pairs](https://github.com/thunlp/COS960) | 2019/6/6 | | | 960 15 960 480240240 | | | [](https://arxiv.org/abs/1906.00247) | | | 16 | OPPOquery-title(https://pan.baidu.com/s/1Hg2Hubsn3GEuu4gubbHCzw 7p3n) | 2018/11/6 | OPPO | | OPPO, query-titlectr | ctr | | | | | 17 | [(SogouE)](https://www.sogou.com/labs/resource/e.php) | 2012 | | | URL ]\tURL\t URL 12 | [Automatic Search Engine Performance Evaluation with Click-through Data Analysis](https://www.sogou.com/labs/paper/Automatic_Search_Engine_Performance_Evaluation_with_Click-through_Data_Analysis.pdf) | | | | # | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | ---------- | ----------------------------------------------------- | ------ | ------------------------------------------------------------ | ------------------------------ | -------- | ----------------------------------------------------- | ---------- | | 1 | [LCSTS](http://icrc.hitsz.edu.cn/Article/show/139.html) | 2015/8/6 | Qingcai Chen | | 10,6661-5 | | | [](http://arxiv.org/abs/1506.05865) | | | 2 | [](https://www.jianshu.com/p/8f52352f0748?tdsourcetag=s_pcqq_aiomsg) | 2018/6/20 | He Zhengfang | | 679898 | | | \ | | | 3 | [](https://github.com/wonderfulsuccess/chinese_abstractive_corpus) | 2018/6/5 | | | 24500 | | | \ | | | 4 | [NLPCC2017 Task3](http://tcci.ccf.org.cn/conference/2017/taskdata.php) | 2017/11/8 | NLPCC2017 | | NLPCC2017 | | | \ | | | 5 | [2018](https://www.dcjingsai.com/common/cmpt/2018_.html) | 2018/10/11 | DC | | DC | | | \ | | | 6 | [Byte Cup 2018](http://biendata.com/competition/bytecup2018/data/) | 2018/12/4 | | | TopBuzz 130 1000 800 | | | \ | | | 7 | [NEWSROOM](https://summari.es/) | 2018/6/1 | Grusky | | 1998201738130 | | | [](http://aclweb.org/anthology/N18-1065) | | | 8 | [DUC/TAC](https://duc.nist.gov/ https://tac.nist.gov//) | 2014/9/9 | NIST | | Document Understanding Conferences/Text Analysis ConferenceTAC KBPTAC Knowledge Base Population | / | | \ | | | 9 | [CNN/Daily Mail](https://cs.nyu.edu/~kcho/DMQA/) | 2017/7/31 | Standford | GNU v3 | CNN(DailyMail) | | | [](https://arxiv.org/pdf/1704.04368.pdf) | | | 10 | [Amazon SNAP Review](https://snap.stanford.edu/data/web-Amazon.html) | 2013/3/1 | Standford | | Amazon | | | \ | | | 11 | [Gigaword](https://github.com/harvardnlp/sent-summary) | 2003/1/28 | David Graff, Christopher Cieri | | 950w | | | | | | 12 | [RA-MDS](http://www1.se.cuhk.edu.hk/~textmine/dataset/ra-mds/) | 2017/9/11 | Piji Li | | Reader-Aware Multi-Document Summarization451042725 | | | [](http://lipiji.com/docs/li2017ramds.pdf) | | | 13 | [TIPSTER SUMMAC](https://www-nlpir.nist.gov/related_projects/tipster_summac/cmp_lg.html) | 2003/5/21 | The MITRE Corporation and the University of Edinburgh | | 183Computation and Language (cmp-lg) collectionACL | | | \ | | | 14 | [WikiHow](http://www.wikihow.com/) | 2018/10/18 | Mahnaz Koupaee | | 200,000 | | | [](https://arxiv.org/abs/1810.09305) | | | 15 | [Multi-News](https://github.com/Alex-Fabbri/Multi-News) | 2019/12/4 | Alex Fabbri | | 1500newser.com56,216 | | | [](http://arxiv.org/abs/1906.01749) | | | 16 | [MED Summaries](http://lear.inrialpes.fr/people/potapov/med_summaries) | 2018/8/17 | D.Potapov | | 1606010010 | | | [](http://hal.inria.fr/hal-01022967) | | | 17 | [BIGPATENT](http://arxiv.org/abs/1906.03741) | 2019/7/27 | Sharma | | 130 | | | [](http://arxiv.org/abs/1906.03741) | | | 18 | [NYT]([ https://catalog.ldc.upenn.edu/LDC2008T19](https://catalog.ldc.upenn.edu/LDC2008T19)) | 2008/10/17 | Evan Sandhaus | | The New York Times,150,20091120101 | | | \ | | | 19 | [The AQUAINT Corpus of English News Text](https://catalog.ldc.upenn.edu/LDC2002T31) | 2002/9/26 | David Graff | | ()3.75 | | | \ | | | 20 | [Legal Case Reports Data Set](https://archive.ics.uci.edu/ml/datasets/Legal+Case+Reports) | 2012/10/19 | Filippo Galgani | | 2006-2009(FCA)4000 | | | \ | | | 21 | [17 Timelines](http://www.l3s.de/~gtran/timeline/) | 2015/5/29 | G. B. Tran | | | | | [](http://l3s.de/~gtran/publications/www2013.pdf) | | | 22 | [PTS Corpus](https://github.com/FeiSun/ProductTitleSummarizationCorpus) | 2018/10/9 | Fei Sun | | Product Title Summarization Corpus | | | [](https://arxiv.org/abs/1808.06885) | | | 23 | [Scientific Summarization DataSets](https://github.com/Santosh-Gupta/ScientificSummarizationDataSets) | 2019/10/26 | Santosh Gupta | | Semantic Scholar CorpusArXivSemantic Scholar/580ArXiv1991201975/10k26k417k157CS221k | | | \ | | | 24 | [Scientific Document Summarization Corpus and Annotations from the WING NUS group](https://github.com/WING-NUS/scisumm-corpus) | 2019/3/19 | Jaidka | | ACL:()()40 | | | [](http://www.aclweb.org/anthology/W16-1511.pdf) | | # | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | --------- | ------------------------------------------------------------ | ---------------------------------- | ------------------------------------------------------------ | -------------------------------- | ------------- | ------------------------------------------------------------ | --------------------------------------- | | 1 | [WMT2017](http://www.statmt.org/wmt17/translation-task.html) | 2017/2/1 | EMNLP 2017 Workshop on Machine Translation | | Europarl corpusUN corpus 2017News Commentary corpus EMNLP benchmark | Benchmark, WMT2017 | | [](https://www.statmt.org/wmt17/pdf/WMT17.pdf) | | | 2 | [WMT2018](http://statmt.org/wmt18/translation-task.html#download) | 2018/11/1 | EMNLP 2018 Workshop on Machine Translation | | Europarl corpusUN corpus 2018News Commentary corpus EMNLP benchmark | Benchmark, WMT2018 | | [](http://www.statmt.org/wmt18/) | | | 3 | [WMT2019](http://www.statmt.org/wmt19/translation-task.html) | 2019/1/31 | EMNLP 2019 Workshop on Machine Translation | | Europarl corpusUN corpus, news-commentary corpus and the ParaCrawl corpus | Benchmark, WMT2019 | | [](http://www.statmt.org/wmt19/pdf/53/WMT01.pdf) | | | 4 | [UM-Corpus:A Large English-Chinese Parallel Corpus](http://nlp2ct.cis.umac.mo/um-corpus/) | 2014/5/26 | Department of Computer and Information Science, University of Macau, Macau | | | UM-Corpus;English; Chinese;large | | [](http://www.lrec-conf.org/proceedings/lrec2014/pdf/774_Paper.pdf) | | | 5 | [Ai challenger translation 2017](https://pan.baidu.com/s/1E5gD5QnZvNxT3ZLtxe_boA : stjf) | 2017/8/14 | AI | | 1000 10,000,000 934 8000 | AI challenger 2017 | | | | | 6 | [MultiUN](http://opus.nlpl.eu/download.php?f=MultiUN/v1/tmx/en-zh.tmx.gz) | 2010 | Department of Linguistics and Philology Uppsala University, Uppsala/Sweden | | | MultiUN | | [MultiUN: A Multilingual corpus from United Nation Documents, Andreas Eisele and Yu Chen, LREC 2010](http://www.dfki.de/lt/publication_show.php?id=4790) | | | 7 | [NIST 2002 Open Machine Translation (OpenMT) Evaluation](https://catalog.ldc.upenn.edu/LDC2010T10) | 2010/5/14 | NIST Multimodal Information Group | LDC User Agreement for Non-Members | Xinhua 70 Zaobao30100 212707 Xinhua25247 Zaobao39256 | NIST | | [](http://www.lrec-conf.org/proceedings/lrec2018/pdf/678.pdf) | | | 8 | [The Multitarget TED Talks Task (MTTT)](http://cs.jhu.edu/~kevinduh/a/multitarget-tedtalks/) | 2018 | Kevin Duh, JUH | | TED20 | TED | | The Multitarget TED Talks Task | | | 9 | [ASPEC Chinese-Japanese](http://lotus.kuee.kyoto-u.ac.jp/WAT/) | 2019 | Workshop on Asian Translation | | | Asian scientific patent Japanese | | http://lotus.kuee.kyoto-u.ac.jp/WAT/ | | | 10 | [casia2015](http://nlp.nju.edu.cn/cwmt-wmt/) | 2015 | research group in Institute of Automation , Chinese Academy of Sciences | | | casia CWMT 2015 | | | | | 11 | [casict2011](http://nlp.nju.edu.cn/cwmt-wmt/) | 2011 | research group in Institute of Computing Technology , Chinese Academy of Sciences | | 2 12 90 | casict CWMT 2011 | | | | | 12 | [casict2015](http://nlp.nju.edu.cn/cwmt-wmt/) | 2015 | research group in Institute of Computing Technology , Chinese Academy of Sciences | | 20060 20/20 99 | casict CWMT 2015 | | | | | 13 | [datum2015](http://nlp.nju.edu.cn/cwmt-wmt/) | 2015 | Datum Data Co., Ltd. | | | datum CWMT 2015 | | | | | 14 | [datum2017](http://nlp.nju.edu.cn/cwmt-wmt/) | 2017 | Datum Data Co., Ltd. | | 20 50,000 10Book1-Book10 | datum CWMT 2017 | | | | | 15 | [neu2017](http://nlp.nju.edu.cn/cwmt-wmt/) | 2017 | NLP lab of Northeastern University, China | | 200 90 | neu CWMT 2017 | | | | | 16 | [(translation2019zh)](https://github.com/brightmart/nlp_chinese_corpus) | 2019 | | | | | | | | # | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | --------- | ---------------------------------------------- | ---- | ------------------------------------------------------------ | ------ | ---- | -------- | ---- | | 1 | [NLPIR100](http://www.nlpir.org/wordpress/download/weibo_relation_corpus.rar) | 2017/12/2 | | | NLPIR 1.NLPIR(127.0.0.1/wordpress)100010 2.urlEmailkevinzhang@bit.edu.cn 3.NLPIR(http://www.nlpir.org/) 4. person_id id guanzhu_id id | | | | | # | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | ---------- | ---------------------------------------------- | ---- | ------------------------------------------------------------ | ------ | ---- | -------- | ---- | | 1 | [NLPIR-23]([http://www.nlpir.org/wordpress/2017/12/03/nlpir%e5%be%ae%e5%8d%9a%e5%86%85%e5%ae%b9%e8%af%ad%e6%96%99%e5%ba%93-23%e4%b8%87%e6%9d%a1/](http://www.nlpir.org/wordpress/2017/12/03/nlpir-23/)) | 201712 | | | NLPIR 1.NLPIR(127.0.0.1/wordpress)231000 2.urlEmailkevinzhang@bit.edu.cn 3.NLPIR(http://www.nlpir.org/) 4. id article discuss insertTime origin person_id id time transmit | | | | | | 2 | [500](http://www.nlpir.org/wordpress/download/weibo.7z) | 20181 | | | 500@ICTCLAS 500sqlmysql500 | | | | | | 3 | [NLPIR-2400](http://www.nlpir.org/wordpress/download/NLPIR-news-corpus.rar) | 20177 | [www.NLPIR.org](http://www.nlpir.org/) | | NLPIR 1.48MB2400 2.2009101220091214 3. 4. 5.www.NLPIR.org 6. NLPIR.org | | | | | | 4 | [NLPIR100](http://www.nlpir.org/wordpress/download/weibo_relation_corpus.rar) | 201712 | | | NLPIR 1.NLPIR(127.0.0.1/wordpress)100010 2.urlEmailkevinzhang@bit.edu.cn 3.NLPIR(http://www.nlpir.org/) 4. person_id id guanzhu_id id | | | | | | 5 | [NLPIR100]([http://www.nlpir.org/wordpress/2017/09/02/nlpir%e5%be%ae%e5%8d%9a%e5%8d%9a%e4%b8%bb%e8%af%ad%e6%96%99%e5%ba%93100%e4%b8%87%e6%9d%a1/](http://www.nlpir.org/wordpress/2017/09/02/nlpir100/)) | 20179 | | | NLPIR 1.NLPIR(127.0.0.1/wordpress)1001 2.urlEmailkevinzhang@bit.edu.cn 3.NLPIR(http://www.nlpir.org/) 4. id id sex address fansNum summary wbNum gzNum blog edu work renZh brithday | | | | | | 6 | [NLPIR-40]([http://www.nlpir.org/wordpress/2017/08/12/nlpir%e7%9f%ad%e6%96%87%e6%9c%ac%e8%af%ad%e6%96%99%e5%ba%93-40%e4%b8%87%e5%ad%97/](http://www.nlpir.org/wordpress/2017/08/12/nlpir-40/)) | 20178 | (SMS@BIT) | | NLPIR 1.488704 2.www.NLPIR.org 3. | | | | | | 7 | [](https://dumps.wikimedia.org/zhwiki/) | \ | | | | | | | | | 8 | []([https://github.com/chinese-poetry/chinese-poetry](https://link.zhihu.com/?target=https%3A//github.com/chinese-poetry/chinese-poetry)) | 2020 | githubhttp://shici.store | | | | | | | | 9 | [](https://github.com/chatopera/insuranceqa-corpus-zh) | 2017 | | | Insurance Library QA label"""" | | | | | | 10 | [](https://github.com/kfcd/chaizi) | 19057 | | | 17,803chaizi-ft.txtchaizi-jt.txt | | | | | | 11 | [](https://github.com/brightmart/nlp_chinese_corpus) | 2016 | | | | | | | | | 12 | [json(baike2018qa)](https://github.com/brightmart/nlp_chinese_corpus) | 2018 | | | | | | | | | 13 | [json(webtext2019zh) ](https://github.com/brightmart/nlp_chinese_corpus) | 2019 | | | 1 2() 3(cQA) 4 5 | | | | | | 14 | [.json(wiki2019zh)](https://github.com/brightmart/nlp_chinese_corpus) | 2019 | | | wiki | | | | | # | ID | | | | | | | | | | | ---- | ------------------------------------------------------------ | ---------- | --------------------------- | ---------------------- | ------------------------------------------------------------ | -------------------------- | ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | | 1 | [DuReader](http://ai.baidu.com/broad/download?dataset=dureader) | 2018/3/1 | | Apache2.0 | 5 | | | [](https://arxiv.org/abs/1711.05073) | | | 2 | [CJRC](https://github.com/china-ai-law-challenge/CAIL2019) | 2019/8/17 | HFL | \ | 10,00050,000 | | | [](https://link.springer.com/chapter/10.1007/978-3-030-32381-3_36) | | | 3 | [2019CMRC ](https://github.com/ymcui/cmrc2019) | 201910 | HFL | CC-BY-SA-4.0 | | | | \ | https://hfl-rc.github.io/cmrc2019/ | | 4 | [2018CMRC ](https://github.com/ymcui/cmrc2018) | 2018/10/19 | HFL | CC-BY-SA-4.0 | CMRC 201820,000 | | | [](https://www.aclweb.org/anthology/D19-1600/) | https://hfl-rc.github.io/cmrc2018/ | | 5 | [2017CMRC ](https://github.com/ymcui/Chinese-Cloze-RC) | 2017/10/14 | HFL | CC-BY-SA-4.0 | PD&CFT | | | [](https://arxiv.org/abs/1607.02250) | [](https://hfl-rc.github.io/cmrc2017/) | | 6 | [](https://www.kesci.com/home/competition/5d142d8cbb14e6002c04e14a/content/5) | 2019/9/3 | | \ | | | | \ | [](https://www.kesci.com/home/competition/5d142d8cbb14e6002c04e14a) | | 7 | [CoQA](https://stanfordnlp.github.io/coqa/) | 2018/9 | | CC BY-SA 4.0Apache | CoQA | | | [](https://arxiv.org/abs/1808.07042) | [](https://www.jiqizhixin.com/articles/2018-09-11-3) | | 8 | [SQuAD2.0](https://github.com/rajpurkar/SQuAD-explorer/tree/master/dataset) | 2018/1/11 | | \ | 500 SQuAD 2.0 | | | [](https://arxiv.org/abs/1806.03822) | | | 9 | [SQuAD1.0](https://github.com/rajpurkar/SQuAD-explorer/tree/master/dataset) | 2016 | | \ | 2016107,785 536 | | | [](https://arxiv.org/pdf/1606.05250.pdf) | | | 10 | [MCTest](https://www.microsoft.com/en-us/research/publication/mctest-challenge-dataset-open-domain-machine-comprehension-text/) | 2013 | | \ | 100,000Bing1,000,000 | | | [](https://microsoft.github.io/msmarco/) | | | 11 | [CNN/Dailymail](https://cs.nyu.edu/~kcho/DMQA/) | 2015 | DeepMind | Apache-2.0 | CNN90k380k Dailymail197k879k | | | [](https://arxiv.org/abs/1506.03340) | | | 12 | [RACE](http://www.cs.cmu.edu/~glai1/data/race/) | 2017 | | / | 5 4 1 28000+ passages 100,000 | | | [](https://arxiv.org/abs/1704.04683) | | | 13 | [HEAD-QA](https://github.com/aghie/head-qa) | 2019 | aghie | MIT | | | | [](https://arxiv.org/pdf/1906.04701.pdf) | | | 14 | [Consensus Attention-based Neural Networks for Chinese Reading Comprehension](http://hfl.iflytek.com/chinese-rc/) | 2018 | | / | | | | [](https://arxiv.org/pdf/1607.02250.pdf) | | | 15 | [WikiQA](https://www.microsoft.com/en-us/research/publication/wikiqa-a-challenge-dataset-for-open-domain-question-answering/) | 2015 | | / | WikiQA | | | [](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/YangYihMeek_EMNLP-15_WikiQA.pdf) | | | 16 | [Childrens Book Test (CBT)](https://research.fb.com/downloads/babi/) | 2016 | Facebook | / | | | | [](https://arxiv.org/pdf/1511.02301.pdf) | | | 17 | [NewsQA](https://www.microsoft.com/en-us/research/project/newsqa-dataset/) | 2017 | Maluuba Research | / | 100000CNN10000 | | | [](https://arxiv.org/pdf/1611.09830.pdf) | | | 18 | [Frames dataset](https://www.microsoft.com/en-us/research/project/frames-dataset/#!download) | 2017 | | / | 136915 | | | [](https://arxiv.org/pdf/1704.00057.pdf) | | | 19 | [Quasar](https://github.com/bdhingra/quasar) | 2017 | | BSD-2-Clause | Quasar-S37000 Stack overflow Quasar-T43000 | | | [](https://arxiv.org/pdf/1707.03904.pdf) | | | 20 | [MS MARCO](http://www.msmarco.org/dataset.aspx) | 2018 | | / | BING 1020MARCO BING BING | | | [](https://arxiv.org/pdf/1611.09268.pdf) | | | 21 | [](https://github.com/ymcui/Chinese-Cloze-RC) | 2016 | | | PD&CFT People Daily and Children's Fairy Tale | | | [](http://aclanthology.info/papers/consensus-attention-based-neural-networks-for-chinese-reading-comprehension) | | | 22 | [NLPCC ICCPOL2016](http://tcci.ccf.org.cn/conference/2016/) | 2016.12.2 | NLPCC | | 1465914K | | | \ | | # [](dukeenglish.github.io) Share your data set with community or make a contribution today! Just send email to chineseGLUE#163.com, or join QQ group: 836811304
Owner
- Login: alixunxing
- Kind: user
- Repositories: 18
- Profile: https://github.com/alixunxing