https://github.com/chapzq77/latticelstm

Chinese NER using Lattice LSTM. Code for ACL 2018 paper.

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Chinese NER using Lattice LSTM. Code for ACL 2018 paper.

Basic Info

Host: GitHub
Owner: chapzq77
Language: Python
Default Branch: master
Homepage:
Size: 330 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of jiesutd/LatticeLSTM

Created almost 7 years ago · Last pushed about 7 years ago

https://github.com/chapzq77/LatticeLSTM/blob/master/

Chinese NER Using Lattice LSTM
====

Lattice LSTM for Chinese NER. Character based LSTM with Lattice embeddings as input.

Models and results can be found at our ACL 2018 paper [Chinese NER Using Lattice LSTM](https://arxiv.org/pdf/1805.02023.pdf). It achieves 93.18% F1-value on MSRA dataset, which is the state-of-the-art result on Chinese NER task.

Details will be updated soon.

Requirement:
======
	Python: 2.7   
	PyTorch: 0.3.0 
(for PyTorch 0.3.1, please refer [issue#8](https://github.com/jiesutd/LatticeLSTM/issues/8) for a slight modification.)

Input format:
======
CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.

		B-LOC
		E-LOC
		O
		B-PER
		I-PER
		E-PER

		O
		O
		O
		O
		O
		O
		O 

Pretrained Embeddings:
====
The pretrained character and word embeddings are the same with the embeddings in the baseline of [RichWordSegmentor](https://github.com/jiesutd/RichWordSegmentor)

Character embeddings (gigaword_chn.all.a2b.uni.ite50.vec): [Google Drive](https://drive.google.com/file/d/1_Zlf0OAZKVdydk7loUpkzD2KPEotUE8u/view?usp=sharing) or [Baidu Pan](https://pan.baidu.com/s/1pLO6T9D)

Word(Lattice) embeddings (ctb.50d.vec): [Google Drive](https://drive.google.com/file/d/1K_lG3FlXTgOOf8aQ4brR9g3R40qi1Chv/view?usp=sharing) or [Baidu Pan](https://pan.baidu.com/s/1pLO6T9D)

How to run the code?
====
1. Download the character embeddings and word embeddings and put them in the `data` folder.
2. Modify the `run_main.py` or `run_demo.py` by adding your train/dev/test file directory.
3. `sh run_main.py` or `sh run_demo.py`


Resume NER data 
====
Crawled from the Sina Finance, it includes the resumes of senior executives from listed companies in the Chinese stock market. Details can be found in our paper.


Cite: 
========
Please cite our ACL 2018 paper:

    @article{zhang2018chinese,  
     title={Chinese NER Using Lattice LSTM},  
     author={Yue Zhang and Jie Yang},  
     booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)},
     year={2018}  
    }

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/chapzq77/latticelstm

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/chapzq77/LatticeLSTM/blob/master/

Owner

GitHub Events

Total

Last Year