nlp-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

https://github.com/sebastianruder/nlp-progress

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
33 of 307 committers (10.7%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.4%) to scientific vocabulary

Keywords

dialogue machine-learning machine-translation named-entity-recognition natural-language-processing nlp-tasks

Keywords from Contributors

word2vec word-similarity word-embeddings topic-modeling information-retrieval gensim fasttext document-similarity data-mining retrieval

Last synced: 9 months ago · JSON representation ·

Repository

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Basic Info

Host: GitHub
Owner: sebastianruder
License: mit
Language: Python
Default Branch: master
Homepage: https://nlpprogress.com/
Size: 1.33 MB

Statistics

Stars: 22,871
Watchers: 1,261
Forks: 3,623
Open Issues: 38
Releases: 3

Topics

dialogue machine-learning machine-translation named-entity-recognition natural-language-processing nlp-tasks

Created almost 8 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

Tracking Progress in Natural Language Processing

English

Vietnamese

Hindi

Chinese

For more tasks, datasets and results in Chinese, check out the Chinese NLP website.

French

Russian

Spanish

Bengali

Persian

Turkish

Summarization

German

Arabic

Language modeling

This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.

It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. The main objective is to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for their task of interest, which serves as a stepping stone for further research. To this end, if there is a place where results for a task are already published and regularly maintained, such as a public leaderboard, the reader will be pointed there.

If you want to find this document again in the future, just go to nlpprogress.com or nlpsota.com in your browser.

Contributing

Guidelines

Results Results reported in published papers are preferred; an exception may be made for influential preprints.

Datasets Datasets should have been used for evaluation in at least one published paper besides the one that introduced the dataset.

Code We recommend to add a link to an implementation if available. You can add a Code column (see below) to the table if it does not exist. In the Code column, indicate an official implementation with Official. If an unofficial implementation is available, use Link (see below). If no implementation is available, you can leave the cell empty.

Adding a new result

If you would like to add a new result, you can just click on the small edit button in the top-right corner of the file for the respective task (see below).

Click on the edit button to add a file

This allows you to edit the file in Markdown. Simply add a row to the corresponding table in the same format. Make sure that the table stays sorted (with the best result on top). After you've made your change, make sure that the table still looks ok by clicking on the "Preview changes" tab at the top of the page. If everything looks good, go to the bottom of the page, where you see the below form.

Fill out the file change information

Add a name for your proposed change, an optional description, indicate that you would like to "Create a new branch for this commit and start a pull request", and click on "Propose file change".

Adding a new dataset or task

For adding a new dataset or task, you can also follow the steps above. Alternatively, you can fork the repository. In both cases, follow the steps below:

If your task is completely new, create a new file and link to it in the table of contents above.
If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
Briefly describe the dataset/task and include relevant references.
Describe the evaluation setting and evaluation metric.
Show how an annotated example of the dataset/task looks like.
Add a download link if available.
Copy the below table and fill in at least two results (including the state-of-the-art) for your dataset/task (change Score to the metric of your dataset). If your dataset/task has multiple metrics, add them to the right of Score.
Submit your change as a pull request.

| Model | Score | Paper / Source | Code | | ------------- | :-----:| --- | --- | | | | | |

Wish list

These are tasks and datasets that are still missing:

Bilingual dictionary induction
Discourse parsing
Keyphrase extraction
Knowledge base population (KBP)
More dialogue tasks
Semi-supervised learning
Frame-semantic parsing (FrameNet full-sentence analysis)

Exporting into a structured format

You can extract all the data into a structured, machine-readable JSON format with parsed tasks, descriptions and SOTA tables.

The instructions are in structured/README.md.

Instructions for building the site locally

Instructions for building the website locally using Jekyll can be found here.

Owner

Name: Sebastian Ruder
Login: sebastianruder
Kind: user
Location: Berlin, Germany
Company: @google

Website: http://ruder.io
Twitter: seb_ruder
Repositories: 32
Profile: https://github.com/sebastianruder

Research Scientist @Google

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Ruder"
  given-names: "Sebastian"
title: "NLP-progress"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2022-02-06
url: "https://nlpprogress.com/"

GitHub Events

Total

Watch event: 515
Issue comment event: 1
Pull request event: 4
Fork event: 43

Last Year

Watch event: 515
Issue comment event: 1
Pull request event: 4
Fork event: 43

Committers

Last synced: about 1 year ago

All Time

Total Commits: 657
Total Committers: 307
Avg Commits per committer: 2.14
Development Distribution Score (DDS): 0.936

Past Year

Commits: 13
Committers: 11
Avg Commits per committer: 1.182
Development Distribution Score (DDS): 0.846

Top Committers

Name	Email	Commits
Sebastian	r**n@g**m	42
Piotr Migdał	p**l@g**m	17
Jonathan Kummerfeld	j**k@b**u	15
Shikhar Vashishth	s**h@g**m	13
Vietnamese CoreNLP	3****p	10
Raihan Ramadistra	r**a@g**m	10
Sepehr Sameni	S**i@g**m	9
udnet96	u**6@n**m	9
Yuval Pinter	u**p@g**u	8
Leon Derczynski	l**i@g**m	8
Shamil Chollampatt	s****m	7
h-amirkhani	h**n@g**m	7
Reinald Kim	k**1@g**m	6
Kaiqiang Song	k**g@k**u	6
Cola	4****m	6
Cenny	c**r@g**m	6
Stephen Mayhew	s**w@g**m	6
Yuanhe Tian	y**n@u**u	6
Thomas Lisankie	t**e@g**m	5
Syed Shahbaz Ahmed	s****d	5
Nirant	N****K	5
Manuel	m**r@g**m	5
FredRodrigues	f**2@g**m	5
peterjliu	p**u@g**m	4
gangeshwark	g**k@g**m	4
avisil	4****l	4
Yijia Liu	o**u@g**m	4
YangHeng	5****5	4
Christian Hadiwinoto	c****d	4
Junru Zhou	5****Z	4
and 277 more...

Committer Domains (Top 20 + Academic)

qq.com: 8 163.com: 6 illinois.edu: 3 naver.com: 2 yandex.ru: 2 uwaterloo.ca: 2 us.ibm.com: 2 di.uniroma1.it: 2 mailoo.org: 1 mail.bnu.edu.cn: 1 mymail.sutd.edu.sg: 1 georgetown.edu: 1 grammarly.com: 1 mail.ru: 1 europa.snu.ac.kr: 1 shanghaitech.edu.cn: 1 mail.ustc.edu.cn: 1 domain.com.au: 1 cam.ac.uk: 1 msn.com: 1 ttic.edu: 1 berkeley.edu: 1 gatech.edu: 1 knights.ucf.edu: 1 uw.edu: 1 cardiff.ac.uk: 1 student.unimelb.edu.au: 1 mails.tsinghua.edu.cn: 1 vesta1.let.rug.nl: 1 research.iiit.ac.in: 1 ualberta.ca: 1 dtu.dk: 1 inf.u-szeged.hu: 1 pku.edu.cn: 1 cse.iitb.ac.in: 1 cs.washington.edu: 1 utexas.edu: 1 umich.edu: 1 buaa.edu.cn: 1 princeton.edu: 1 mail.mcgill.ca: 1 ecei.tohoku.ac.jp: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 36
Total pull requests: 168
Average time to close issues: 24 days
Average time to close pull requests: 2 months
Total issue authors: 34
Total pull request authors: 123
Average comments per issue: 2.53
Average comments per pull request: 1.54
Merged pull requests: 141
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: 28 minutes
Issue authors: 1
Pull request authors: 4
Average comments per issue: 0.0
Average comments per pull request: 0.4
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

WilliamTambellini (2)
BradKML (2)
davidalbertonogueira (1)
Hrant-Khachatrian (1)
jplu (1)
nirmalsinghania2008 (1)
eece-23 (1)
ajayrfhp (1)
cbockman (1)
lrjocean (1)
LifeIsStrange (1)
soaxelbrooke (1)
pwichmann (1)
ZedZipDev (1)
gupta-alok (1)

Pull Request Authors

yuanheTian (7)
adrienpayong (4)
KhondokerIslam (4)
avisil (4)
rmanak (3)
ExplorerFreda (3)
sebastianruder (3)
ramadistra (3)
hossein-amirkhani (3)
octavian-ganea (3)
BogdanDidenko (2)
liufly (2)
longdct (2)
stompchicken (2)
rafiepour (2)

nlp-progress

Science Score: 54.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Tracking Progress in Natural Language Processing

Table of contents

English

Vietnamese

Hindi

Chinese

French

Russian

Spanish

Portuguese

Korean

Nepali

Bengali

Persian

Turkish

German

Arabic

Contributing

Guidelines

Adding a new result

Adding a new dataset or task

Wish list

Exporting into a structured format

Instructions for building the site locally

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies