https://github.com/aehrc/laat

A Label Attention Model for ICD Coding from Clinical Text

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Keywords

clinical-text icd-coding laat label-attention-classification

Last synced: 10 months ago · JSON representation

Repository

A Label Attention Model for ICD Coding from Clinical Text

Basic Info

Host: GitHub
Owner: aehrc
License: other
Language: Python
Default Branch: master
Homepage:
Size: 107 MB

Statistics

Stars: 69
Watchers: 8
Forks: 22
Open Issues: 1
Releases: 0

Topics

clinical-text icd-coding laat label-attention-classification

Created almost 6 years ago · Last pushed almost 4 years ago

Metadata Files

Readme License

A Label Attention Model for ICD Coding from Clinical Text

GitHub top language GitHub repo size GitHub last commit

This project provides the code for our JICAI 2020 A Label Attention Model for ICD Coding from Clinical Text paper.

The general architecture and experimental results can be found in our paper:

@inproceedings{ijcai2020-461-vu, title = {A Label Attention Model for ICD Coding from Clinical Text}, author = {Vu, Thanh and Nguyen, Dat Quoc and Nguyen, Anthony}, booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI-20}}, pages = {3335--3341}, year = {2020}, month = {7}, note = {Main track} doi = {10.24963/ijcai.2020/461}, url = {https://doi.org/10.24963/ijcai.2020/461}, }

Please CITE our paper when this code is used to produce published results or incorporated into other software.

Requirements

python>=3.6
torch==1.4.0
scikit-learn==0.23.1
numpy==1.16.3
scipy==1.2.1
pandas==0.24.2
tqdm==4.31.1
nltk>=3.4.5
psycopg2==2.7.7
gensim==3.6.0
transformers==2.11.0

Run pip install -r requirements.txt to install the required libraries

Run python3 and run import nltk and nltk.download('punkt') for tokenization

Data preparation

MIMIC-III-full and MIMIC-III-50 experiments

data/mimicdata/mimic3

The id files are from caml-mimic
Install the MIMIC-III database with PostgreSQL following this instruction
Generate the train/valid/test sets using src/util/mimiciii_data_processing.py. (Configure the connection to PostgreSQL at Line 139)

MIMIC-II-full experiment

data/mimicdata/mimic2

Place the MIMIC-II file (MIMICRAWDSUMS) to data/mimicdata/mimic2
Generate the train/valid/test sets using src/util/mimicii_data_processing.py.

Note that: The code will generate 3 files (train.csv, valid.csv, and test.csv) for each experiment.

Pretrained word embeddings

data/embeddings

We used gensim to train the embeddings (word2vec model) using the entire MIMIC-III discharge summary data.

Our code also supports subword embeddings (fastText) which helps produce better performances (see src/args_parser.py).

How to run

The problem and associated configurations are defined in configuration/config.json. Note that there are 3 files in each data folder (train.csv, valid.csv and test.csv)

There are common hyperparameters for all the models and the model-specific hyperparameters. See src/args_parser.py for more detail

Here is an example of using the framework on MIMIC-III dataset (full codes) with hierarchical join learning

python -m src.run \ --problem_name mimic-iii_2_full \ --max_seq_length 4000 \ --n_epoch 50 \ --patience 5 \ --batch_size 8 \ --optimiser adamw \ --lr 0.001 \ --dropout 0.3 \ --level_projection_size 128 \ --main_metric micro_f1 \ --embedding_mode word2vec \ --embedding_file data/embeddings/word2vec_sg0_100.model \ --attention_mode label \ --d_a 512 \ RNN \ --rnn_model LSTM \ --n_layers 1 \ --bidirectional 1 \ --hidden_size 512

Owner

Name: The Australian e-Health Research Centre
Login: aehrc
Kind: organization

Website: https://aehrc.com
Twitter: ehealthresearch
Repositories: 101
Profile: https://github.com/aehrc

The Australian e-Health Research Centre (AEHRC) is CSIRO’s digital health research program.

GitHub Events

Total

Watch event: 3
Fork event: 1

Last Year

Watch event: 3
Fork event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 9
Total pull requests: 1
Average time to close issues: 15 days
Average time to close pull requests: less than a minute
Total issue authors: 6
Total pull request authors: 1
Average comments per issue: 2.11
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Abhinav43 (3)
gmichalo (2)
jatinvinkumar (1)
drinkingxi (1)
jiaminchen-1031 (1)
MichalMalyska (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/aehrc/laat

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

readme.md

A Label Attention Model for ICD Coding from Clinical Text

Requirements

Data preparation

MIMIC-III-full and MIMIC-III-50 experiments

MIMIC-II-full experiment

Pretrained word embeddings

How to run

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels