https://github.com/alixunxing/pycorrector

pycorrector is a toolkit for text error correction. 文本纠错，Kenlm，ConvSeq2Seq，BERT，MacBERT，ELECTRA，ERNIE，Transformer，T5等模型实现，开箱即用。
Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:
○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, springer.com
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation
Repository

pycorrector is a toolkit for text error correction. 文本纠错，Kenlm，ConvSeq2Seq，BERT，MacBERT，ELECTRA，ERNIE，Transformer，T5等模型实现，开箱即用。
Basic Info

Host: GitHub
Owner: alixunxing
License: apache-2.0
Default Branch: master
Homepage: https://www.mulanai.com/product/corrector/
Size: 50.1 MB
Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0
Fork of shibing624/pycorrector
Created about 3 years ago · Last pushed about 3 years ago
https://github.com/alixunxing/pycorrector/blob/master/

![alt text](docs/pycorrector.png)

[![PyPI version](https://badge.fury.io/py/pycorrector.svg)](https://badge.fury.io/py/pycorrector)
[![Downloads](https://pepy.tech/badge/pycorrector)](https://pepy.tech/project/pycorrector)
[![GitHub contributors](https://img.shields.io/github/contributors/shibing624/pycorrector.svg)](https://github.com/shibing624/pycorrector/graphs/contributors)
[![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![python_vesion](https://img.shields.io/badge/Python-3.6%2B-green.svg)](requirements.txt)
[![GitHub issues](https://img.shields.io/github/issues/shibing624/pycorrector.svg)](https://github.com/shibing624/pycorrector/issues)
[![Wechat Group](http://vlog.sfyc.ltd/wechat_everyday/wxgroup_logo.png?imageView2/0/w/60/h/20)](#wechat-group)

[English](README.en.md) | 

# pycorrector

python3

**pycorrector**KenlmConvSeq2SeqBERTMacBERTELECTRAERNIETransformerSigHAN

**Guide**

- [Question](#Question)
- [Solution](#Solution)
- [Evaluation](#Evaluation)
- [Install](#install)
- [Usage](#usage)
- [Deep Model Usage](#deep-model-usage)
- [Dataset](#Dataset)
- [Contact](#Contact)
- [Reference](#reference)

# Question





OCR
query

""

# Solution

### 


1. 
2.  
3. 

### 

1. RNNRNN Attn
2. CRF2016
3. Seq2SeqEncoder-Decoder
4. BERT/ELECTRA/ERNIE/MacBERTNLPMASKfine-tune

PS

- [](https://github.com/shibing624/pycorrector/wiki/pycorrector%E6%BA%90%E7%A0%81%E8%A7%A3%E8%AF%BB-%E7%9B%B4%E6%92%AD%E5%88%86%E4%BA%AB)
- [](https://zhuanlan.zhihu.com/p/138981644)


# Feature

* [Kenlm](pycorrector/corrector.py)KenlmNGram
* [MacBERT](pycorrector/macbert)PyTorchMacBERT4CSC
* [Seq2Seq](pycorrector/seq2seq)PyTorchSeq2SeqConvSeq2SeqConvSeq2SeqNLPCC-2018
* [T5](pycorrector/t5)PyTorchT5Langboat/mengzi-t5-basefine-tune
* [BERT](pycorrector/bert)PyTorchBERTfill-mask
* [ELECTRA](pycorrector/electra)PyTorchELECTRAfill-mask
* [ERNIE_CSC](pycorrector/ernie_csc)PaddlePaddleERNIE_CSCERNIE-1.0fine-tune
* [DeepContext](pycorrector/deepcontext)PyTorchDeepContextStanford UniversityNLC2014
* [Transformer](pycorrector/transformer)PyTorchfairseqTransformer

#### 

1. 
2. CGED, Chinese Grammar Error DiagnosisTODO

# Demo

Official Demo: https://www.mulanai.com/product/corrector/

HuggingFace Demo: https://huggingface.co/spaces/shibing624/pycorrector

![](docs/hf.png)

run example: [examples/gradio_demo.py](examples/gradio_demo.py) to see the demo:
```shell
python examples/gradio_demo.py
```

# Evaluation

[examples/evaluate_models.py](./examples/evaluate_models.py)

- sighan15SIGHAN2015[pycorrector/data/cn/sighan_2015/test.tsv](pycorrector/data/cn/sighan_2015/test.tsv)
  
- Sentence Level

### 
SIGHAN2015

GPUTesla V100 32 GB

|  | Backbone | GPU | Precision | Recall | F1 | QPS |
| :-- | :-- | :---  | :----- | :--| :--- | :--- |
| Rule(pycorrector.correct) | kenlm | CPU | 0.6860 | 0.1529 | 0.2500 | 9 |
| BERT | bert-base-chinese | GPU | 0.8029 | 0.4052 | 0.5386 | 2 |
| BART | fnlp/bart-base-chinese | GPU | 0.6984 | 0.6354 | 0.6654 | 58 |
| T5 | byt5-small | GPU | 0.5220 | 0.3941 | 0.4491 | 111 |
| Mengzi-T5 | mengzi-t5-base | GPU | 0.8321 | 0.6390 | 0.7229 | 214 |
| ConvSeq2Seq | ConvSeq2Seq | GPU | 0.2415 | 0.1436 | 0.1801 | 6 |
| **MacBert** | **macbert-base-chinese** | **GPU** | **0.8254** | **0.7311** | **0.7754** | **224** |

### 

- **MacBert***shibing624/macbert4csc-base-chinese*huggingface model[shibing624/macbert4csc-base-chinese](https://huggingface.co/shibing624/macbert4csc-base-chinese)
- **Seq2Seq***shibing624/bart4csc-base-chinese*huggingface model[shibing624/bart4csc-base-chinese](https://huggingface.co/shibing624/bart4csc-base-chinese)
- **T5***shibing624/mengzi-t5-base-chinese-correction*huggingface model[shibing624/mengzi-t5-base-chinese-correction](https://huggingface.co/shibing624/mengzi-t5-base-chinese-correction)fine-tune`SIGHAN 2015`SOTA

# Install

```shell
pip install -U pycorrector
```

or

```shell
pip install -r requirements.txt

git clone https://github.com/shibing624/pycorrector.git
cd pycorrector
pip install --no-deps .
```


docker

#### 

* docker

```shell
docker run -it -v ~/.pycorrector:/root/.pycorrector shibing624/pycorrector:0.0.2
```

pythonkenlmpycorrector[Dockerfile](Dockerfile)



![docker](docs/git_image/docker.png)

* kenlm

```
pip install https://github.com/kpu/kenlm/archive/master.zip
```

[kenlm-wiki](https://github.com/shibing624/pycorrector/wiki/Install-kenlm)

* 

```
pip install -r requirements.txt
```

# Usage

### 

example: [examples/base_demo.py](examples/base_demo.py)

```python
import pycorrector

corrected_sent, detail = pycorrector.correct('')
print(corrected_sent, detail)
```

output:

```
 [('', '', 4, 6), ('', '', 10, 11)]
```

> `~/.pycorrector/datasets/zh_giga.no_cna_cmn.prune01244.klm`kenlm
[(2.8G)](https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm)

### 

example: [examples/detect_demo.py](examples/detect_demo.py)

```python
import pycorrector

idx_errors = pycorrector.detect('')
print(idx_errors)
```

output:

```
[['', 4, 6, 'word'], ['', 10, 11, 'char']]
```

> `list`, `[error_word, begin_pos, end_pos, error_type]``pos`0

### 

example: [examples/proper_correct_demo.py](examples/proper_correct_demo.py)

```python
import sys

sys.path.append("..")
from pycorrector.proper_corrector import ProperCorrector

m = ProperCorrector()
x = [
    '',
    '',
]

for i in x:
    print(i, ' -> ', m.proper_correct(i))
```

output:

```
  ->  ('', [('', '', 2, 6)])
  ->  ('', [('', '', 3, 6)])
```


### 

12

example: [examples/use_custom_confusion.py](examples/use_custom_confusion.py)

```python
import pycorrector

error_sentences = [
    'iphonex',
    '',
]
for line in error_sentences:
    print(pycorrector.correct(line))

print('*' * 42)
pycorrector.set_custom_confusion_path_or_dict('./my_custom_confusion.txt')
for line in error_sentences:
    print(pycorrector.correct(line))
```

output:

```
('iphonex', [])   # "iphonex""iphoneX"
('', [['', '', 14, 17]]) # ""
*****************************************************
('iphonex', [['iphonex', 'iphoneX', 1, 8]])
('', [])
```

> `./my_custom_confusion.txt`

```
iPhone iPhoneX
 
```

> `correct`
> `set_custom_confusion_dict``path`(str)(dict)

### 

kenlm`zh_giga.no_cna_cmn.prune01244.klm`2.8G`pycorrector`

kenlm2014140M[people2014corpus_chars.klm(o5e9)](https://pan.baidu.com/s/1I2GElyHy_MAdek3YaziFYw)

example[examples/load_custom_language_model.py](examples/load_custom_language_model.py)

```python
from pycorrector import Corrector
import os

pwd_path = os.path.abspath(os.path.dirname(__file__))
lm_path = os.path.join(pwd_path, './people2014corpus_chars.klm')
model = Corrector(language_model_path=lm_path)

corrected_sent, detail = model.correct('')
print(corrected_sent, detail)
```

output:

```
 [('', '', 4, 6), ('', '', 10, 11)]
```

### 



example[examples/en_correct_demo.py](examples/en_correct_demo.py)

```python
import pycorrector

sent = "what happending? how to speling it, can you gorrect it?"
corrected_text, details = pycorrector.en_correct(sent)
print(sent, '=>', corrected_text)
print(details)
```

output:

```
what happending? how to speling it, can you gorrect it?
=> what happening? how to spelling it, can you correct it?
[('happending', 'happening', 5, 15), ('speling', 'spelling', 24, 31), ('gorrect', 'correct', 44, 51)]
```

### 



example[examples/traditional_simplified_chinese_demo.py](examples/traditional_simplified_chinese_demo.py)

```python
import pycorrector

traditional_sentence = ''
simplified_sentence = pycorrector.traditional2simplified(traditional_sentence)
print(traditional_sentence, '=>', simplified_sentence)

simplified_sentence = ''
traditional_sentence = pycorrector.simplified2traditional(simplified_sentence)
print(simplified_sentence, '=>', traditional_sentence)
```

output:

```
 => 
 => 
```

### 



```
python -m pycorrector -h
usage: __main__.py [-h] -o OUTPUT [-n] [-d] input

@description:

positional arguments:
  input                 the input file path, file encode need utf-8.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        the output file path.
  -n, --no_char         disable char detect mode.
  -d, --detail          print detail info
```

case

```
python -m pycorrector input.txt -o out.txt -n -d
```

> `input.txt``out.txt ``\t`

# Deep Model Usage



``[macbert](./pycorrector/macbert)[seq2seq](./pycorrector/seq2seq)
[bert](./pycorrector/bert)[electra](./pycorrector/electra)[transformer](./pycorrector/transformer)
[ernie-csc](./pycorrector/ernie_csc)[T5](./pycorrector/t5)`pycorrector``README.md`

- 

```
pip install -r requirements-dev.txt
```

## 



### **MacBert4csc[]**

MacBERTHuggingFace Models[https://huggingface.co/shibing624/macbert4csc-base-chinese](https://huggingface.co/shibing624/macbert4csc-base-chinese)


-  MacBERT  BERT  backbone
-  BERT  [detection](https://github.com/shibing624/pycorrector/blob/c0f31222b7849c452cc1ec207c71e9954bd6ca08/pycorrector/macbert/macbert4csc.py#L18) 
MacBERT4CSC  detection  correction  loss  loss BERT MLM  correction  

![macbert_network](https://github.com/shibing624/pycorrector/blob/master/docs/git_image/macbert_network.jpg)

[pycorrector/macbert/README.md](./pycorrector/macbert/README.md)

example[examples/macbert_demo.py](examples/macbert_demo.py)
#### pycorrector

```python
import sys

sys.path.append("..")
from pycorrector.macbert.macbert_corrector import MacBertCorrector

if __name__ == '__main__':
    error_sentences = [
        '',
        '',
        '',
        '',
        '',
    ]

    m = MacBertCorrector("shibing624/macbert4csc-base-chinese")
    for line in error_sentences:
        correct_sent, err = m.macbert_correct(line)
        print("query:{} => {}, err:{}".format(line, correct_sent, err))
```

output

```bash
query: => , err:[('', '', 14, 15)]
query: => , err:[('', '', 4, 5)]
query: => , err:[('', '', 1, 2), ('', '', 10, 11)]
query: => , err:[]
query: => , err:[('', '', 6, 7)]
```

#### transformers

```python
import operator
import torch
from transformers import BertTokenizerFast, BertForMaskedLM
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = BertTokenizerFast.from_pretrained("shibing624/macbert4csc-base-chinese")
model = BertForMaskedLM.from_pretrained("shibing624/macbert4csc-base-chinese")
model.to(device)

texts = ["", ""]

text_tokens = tokenizer(texts, padding=True, return_tensors='pt').to(device)
with torch.no_grad():
    outputs = model(**text_tokens)

def get_errors(corrected_text, origin_text):
    sub_details = []
    for i, ori_char in enumerate(origin_text):
        if ori_char in [' ', '', '', '', '', '\n', '', '', '']:
            # add unk word
            corrected_text = corrected_text[:i] + ori_char + corrected_text[i:]
            continue
        if i >= len(corrected_text):
            break
        if ori_char != corrected_text[i]:
            if ori_char.lower() == corrected_text[i]:
                # pass english upper char
                corrected_text = corrected_text[:i] + ori_char + corrected_text[i + 1:]
                continue
            sub_details.append((ori_char, corrected_text[i], i, i + 1))
    sub_details = sorted(sub_details, key=operator.itemgetter(2))
    return corrected_text, sub_details

result = []
for ids, (i, text) in zip(outputs.logits, enumerate(texts)):
    _text = tokenizer.decode((torch.argmax(ids, dim=-1) * text_tokens.attention_mask[i]),
                             skip_special_tokens=True).replace(' ', '')
    corrected_text, details = get_errors(_text, text)
    print(text, ' => ', corrected_text, details)
    result.append((corrected_text, details))
print(result)
```

output:

```shell
  =>   [('', '', 2, 3)]
  =>   [('', '', 15, 16)]
```



```
macbert4csc-base-chinese
     config.json
     added_tokens.json
     pytorch_model.bin
     special_tokens_map.json
     tokenizer_config.json
     vocab.txt
```

### ErnieCSC

ERNIE[PaddleNLP](https://bj.bcebos.com/paddlenlp/taskflow/text_correction/csc-ernie-1.0/csc-ernie-1.0.pdparams)
[https://bj.bcebos.com/paddlenlp/taskflow/text_correction/csc-ernie-1.0/csc-ernie-1.0.pdparams](https://bj.bcebos.com/paddlenlp/taskflow/text_correction/csc-ernie-1.0/csc-ernie-1.0.pdparams)





[pycorrector/ernie_csc/README.md](./pycorrector/ernie_csc/README.md)

example[examples/ernie_csc_demo.py](examples/ernie_csc_demo.py)

#### pycorrector

```python

from pycorrector.ernie_csc.ernie_csc_corrector import ErnieCSCCorrector

if __name__ == '__main__':
    error_sentences = [
        '',
        '',
        '',
        '',
        '',
    ]
    corrector = ErnieCSCCorrector("csc-ernie-1.0")
    for line in error_sentences:
        result = corrector.ernie_csc_correct(line)[0]
        print("query:{} => {}, err:{}".format(line, result['target'], result['errors']))
```

output:

```bash

query: => , err:[{'position': 14, 'correction': {'': ''}}]
query: => , err:[{'position': 4, 'correction': {'': ''}}, {'position': 10, 'correction': {'': ''}}]
query: => , err:[{'position': 1, 'correction': {'': ''}}, {'position': 10, 'correction': {'': ''}}]
query: => , err:[]
query: => , err:[{'position': 6, 'correction': {'': ''}}]

```

#### PaddleNLP

PaddleNLPTaskflow:

```python

from paddlenlp import Taskflow

text_correction = Taskflow("text_correction")
text_correction('')
text_correction('')

```

output:

```shell

[{'source': '',
    'target': '',
    'errors': [{'position': 3, 'correction': {'': ''}}]}]

[{'source': '',
    'target': '',
    'errors': [{'position': 18, 'correction': {'': ''}}]}]

```

### Bart

```python
from transformers import BertTokenizerFast
from textgen import BartSeq2SeqModel

tokenizer = BertTokenizerFast.from_pretrained('shibing624/bart4csc-base-chinese')
model = BartSeq2SeqModel(
    encoder_type='bart',
    encoder_decoder_type='bart',
    encoder_decoder_name='shibing624/bart4csc-base-chinese',
    tokenizer=tokenizer,
    args={"max_length": 128, "eval_batch_size": 128})
sentences = [""]
print(model.predict(sentences))
```


output:

```shell
['']
```

Bart https://github.com/shibing624/textgen/blob/main/examples/seq2seq/training_bartseq2seq_zh_demo.py

#### Release models

SIGHAN+Wang271KBartreleaseHuggingFace Models:

- BARTHuggingFace Models[https://huggingface.co/shibing624/bart4csc-base-chinese](https://huggingface.co/shibing624/bart4csc-base-chinese)

### ConvSeq2Seq
[pycorrector/seq2seq](pycorrector/seq2seq) :


#### 
data example:
```
# train.txt:
	
```

```shell
cd seq2seq
python train.py
```

`convseq2seq`sighan2104200epochP40GPU3

#### 

```shell
python infer.py
```

output

![result image](./docs/git_image/convseq2seq_ret.png)

1. `unk`(nlpcc2018+hsk130)
2. GPUGPU

#### Release models

SIGHAN2015convseq2seqreleasegithub:

- convseq2seq model url: https://github.com/shibing624/pycorrector/releases/download/0.4.5/convseq2seq_correction.tar.gz


# Dataset

|  |  |  |  |
| :------- | :--------- | :---------: | :---------: |
| **`SIGHAN+Wang271K`** | SIGHAN+Wang271K(27) | [01b9](https://pan.baidu.com/s/1BV5tr9eONZCI0wERFvr0gQ)| 106M |
| **`SIGHAN`** | SIGHAN13 14 15 | [csc.html](http://nlp.ee.ncu.edu.tw/resource/csc.html)| 339K |
| **`Wang271K`** | Wang271K | [Automatic-Corpus-Generation dimmywang](https://github.com/wdimmy/Automatic-Corpus-Generation/blob/master/corpus/train.sgml)| 93M |
| **`2014`** | 2014 | [cHcu](https://l6pmn3b1eo.feishu.cn/file/boxcnKpildqIseq1D4IrLwlir7c?from=from_qr_code)| 383M |
| **`NLPCC 2018 GEC`** | NLPCC2018-GEC | [trainingdata](http://tcci.ccf.org.cn/conference/2018/dldoc/trainingdata02.tar.gz) | 114M |
| **`NLPCC 2018+HSK`** | nlpcc2018+hsk+CGED | [m6fg](https://pan.baidu.com/s/1BkDru60nQXaDVLRSr7ktfA) 
 [gl9y](https://l6pmn3b1eo.feishu.cn/file/boxcnudJgRs5GEMhZwe77YGTQfc?from=from_qr_code) | 215M |
| **`NLPCC 2018+HSK`** | HSK+Lang8 | [n31j](https://pan.baidu.com/s/1DaOX89uL1JRaZclfrV9C0g) 
 [Q9LH](https://l6pmn3b1eo.feishu.cn/file/boxcntebW3NI6OAaqzDUXlZHoDb?from=from_qr_code) | 81M |
| **``** | Chinese Text CorrectionCTC | [](https://tianchi.aliyun.com/dataset/138195) | - |




- SIGHAN+Wang271K(27)SIGHAN131415Wang271KjsonSIGHANtest.json
  macbert4cscpaper[pycorrector/macbert/README.md](pycorrector/macbert/README.md)
- NLPCC 2018 GEC[NLPCC2018-GEC](http://tcci.ccf.org.cn/conference/2018/taskdata.php)
  [trainingdata](http://tcci.ccf.org.cn/conference/2018/dldoc/trainingdata02.tar.gz)[114.5MB]
- HSKlang8[HSK+Lang8][n31j](https://pan.baidu.com/s/1DaOX89uL1JRaZclfrV9C0g)
- NLPCC 2018 + HSK + CGED161718(nlpcc2018+hsk)
  [:m6fg](https://pan.baidu.com/s/1BkDru60nQXaDVLRSr7ktfA) [130215MB]

SIGHAN+Wang271K
```json
[
    {
        "id": "B2-4029-3",
        "original_text": "",
        "wrong_ids": [
            5,
            31
        ],
        "correct_text": ""
    }
]
```
#### 

json

1. wrong_ids(correct_text)
2. original_textwrong_ids
[](https://github.com/dongrixinyu/JioNLP/wiki/%E6%95%B0%E6%8D%AE%E5%A2%9E%E5%BC%BA-%E8%AF%B4%E6%98%8E%E6%96%87%E6%A1%A3#%E5%90%8C%E9%9F%B3%E8%AF%8D%E6%9B%BF%E6%8D%A2)


## Language Model

[-wiki](https://github.com/shibing624/pycorrector/wiki/%E7%BB%9F%E8%AE%A1%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E5%8E%9F%E7%90%86)

[zh_giga.no_cna_cmn.prune01244.klm(2.8G)](https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm)
2014[people2014corpus_chars.klm(o5e9)](https://pan.baidu.com/s/1I2GElyHy_MAdek3YaziFYw)

pycorrector.utils.text_utils

1. kenlmhttp://blog.csdn.net/mingzai624/article/details/79560063
2. <2014> 1people2014.tar.gz 2people2014_words.txt
   3kenlmpeople2014corpus_chars.arps/klm 4kenlmpeople2014corpus_words.arps/klm



# Todo

- [x] 
- [x] seq2seq
- [x] 
- [x] 
- [x] seq2seq_attention dropout
- [x] seq2seqPointer-generator networkBeam searchUnknown words replacementCoverage mechanism
- [x] bertfine-tunedwikitransformers 2.10.0
- [x] TensorFlow 2.0
- [x] bertmask
- [x] electra
- [x] bert/ernie

# Contact

- Github Issue()[![GitHub issues](https://img.shields.io/github/issues/shibing624/pycorrector.svg)](https://github.com/shibing624/pycorrector/issues)
- Github discussions[![GitHub discussions](https://img.shields.io/github/discussions/shibing624/pycorrector.svg)](https://github.com/shibing624/pycorrector/discussions)
- xuming: xuming624@qq.com
- *xuming624*, Python-NLP*--NLP*




# Citation

pycorrector

APA:
```latex
Xu, M. Pycorrector: Text error correction tool (Version 0.4.2) [Computer software]. https://github.com/shibing624/pycorrector
```

BibTeX:
```latex
@misc{Xu_Pycorrector_Text_error,
  title={Pycorrector: Text error correction tool},
  author={Xu Ming},
  year={2021},
  howpublished={\url{https://github.com/shibing624/pycorrector}},
}
```



# License

pycorrector  **Apache License 2.0**pycorrector

# Contribute



- `tests`
- `python -m pytest`

PR

# Reference

* [](https://blog.csdn.net/mingzai624/article/details/82390382)
* [Norvigs spelling corrector](http://norvig.com/spell-correct.html)
* [Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape[Yu, 2013]](http://www.aclweb.org/anthology/W/W14/W14-6835.pdf)
* [Chinese Spelling Checker Based on Statistical Machine Translation[Chiu, 2013]](http://www.aclweb.org/anthology/O/O13/O13-1005.pdf)
* [Chinese Word Spelling Correction Based on Rule Induction[yeh, 2014]](http://aclweb.org/anthology/W14-6822)
* [Neural Language Correction with Character-Based Attention[Ziang Xie, 2016]](https://arxiv.org/pdf/1603.09727.pdf)
* [Chinese Spelling Check System Based on Tri-gram Model[Qiang Huang, 2014]](http://www.anthology.aclweb.org/W/W14/W14-6827.pdf)
* [Neural Abstractive Text Summarization with Sequence-to-Sequence Models[Tian Shi, 2018]](https://arxiv.org/abs/1812.02303)
* [[, 2019]](https://github.com/shibing624/pycorrector/blob/master/docs/.pdf)
* [A Sequence to Sequence Learning for Chinese Grammatical Error Correction[Hongkai Ren, 2018]](https://link.springer.com/chapter/10.1007/978-3-319-99501-4_36)
* [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://openreview.net/pdf?id=r1xMH1BtvB)
* [Revisiting Pre-trained Models for Chinese Natural Language Processing](https://arxiv.org/abs/2004.13922)
* Ruiqing Zhang, Chao Pang et al. "Correcting Chinese Spelling Errors with Phonetic Pre-training", ACL, 2021
* DingminWang et al. "A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check", EMNLP, 2018
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/alixunxing/pycorrector

Science Score: 23.0%

Repository

Basic Info

Statistics

https://github.com/alixunxing/pycorrector/blob/master/

Owner

GitHub Events

Total

Last Year