https://github.com/aehrc/laat
A Label Attention Model for ICD Coding from Clinical Text
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Keywords
Repository
A Label Attention Model for ICD Coding from Clinical Text
Basic Info
Statistics
- Stars: 69
- Watchers: 8
- Forks: 22
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
readme.md
A Label Attention Model for ICD Coding from Clinical Text 
This project provides the code for our JICAI 2020 A Label Attention Model for ICD Coding from Clinical Text paper.
The general architecture and experimental results can be found in our paper:
@inproceedings{ijcai2020-461-vu,
title = {A Label Attention Model for ICD Coding from Clinical Text},
author = {Vu, Thanh and Nguyen, Dat Quoc and Nguyen, Anthony},
booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI-20}},
pages = {3335--3341},
year = {2020},
month = {7},
note = {Main track}
doi = {10.24963/ijcai.2020/461},
url = {https://doi.org/10.24963/ijcai.2020/461},
}
Please CITE our paper when this code is used to produce published results or incorporated into other software.
Requirements
- python>=3.6
- torch==1.4.0
- scikit-learn==0.23.1
- numpy==1.16.3
- scipy==1.2.1
- pandas==0.24.2
- tqdm==4.31.1
- nltk>=3.4.5
- psycopg2==2.7.7
- gensim==3.6.0
- transformers==2.11.0
Run pip install -r requirements.txt to install the required libraries
Run python3 and run import nltk and nltk.download('punkt') for tokenization
Data preparation
MIMIC-III-full and MIMIC-III-50 experiments
data/mimicdata/mimic3
- The id files are from caml-mimic
- Install the MIMIC-III database with PostgreSQL following this instruction
- Generate the train/valid/test sets using
src/util/mimiciii_data_processing.py. (Configure the connection to PostgreSQL at Line 139)
MIMIC-II-full experiment
data/mimicdata/mimic2
- Place the MIMIC-II file (MIMICRAWDSUMS) to
data/mimicdata/mimic2 - Generate the train/valid/test sets using
src/util/mimicii_data_processing.py.
Note that: The code will generate 3 files (train.csv, valid.csv, and test.csv) for each experiment.
Pretrained word embeddings
data/embeddings
We used gensim to train the embeddings (word2vec model) using the entire MIMIC-III discharge summary data.
Our code also supports subword embeddings (fastText) which helps produce better performances (see src/args_parser.py).
How to run
The problem and associated configurations are defined in configuration/config.json. Note that there are 3 files in each data folder (train.csv, valid.csv and test.csv)
There are common hyperparameters for all the models and the model-specific hyperparameters. See src/args_parser.py for more detail
Here is an example of using the framework on MIMIC-III dataset (full codes) with hierarchical join learning
python -m src.run \
--problem_name mimic-iii_2_full \
--max_seq_length 4000 \
--n_epoch 50 \
--patience 5 \
--batch_size 8 \
--optimiser adamw \
--lr 0.001 \
--dropout 0.3 \
--level_projection_size 128 \
--main_metric micro_f1 \
--embedding_mode word2vec \
--embedding_file data/embeddings/word2vec_sg0_100.model \
--attention_mode label \
--d_a 512 \
RNN \
--rnn_model LSTM \
--n_layers 1 \
--bidirectional 1 \
--hidden_size 512
Owner
- Name: The Australian e-Health Research Centre
- Login: aehrc
- Kind: organization
- Website: https://aehrc.com
- Twitter: ehealthresearch
- Repositories: 101
- Profile: https://github.com/aehrc
The Australian e-Health Research Centre (AEHRC) is CSIRO’s digital health research program.
GitHub Events
Total
- Watch event: 3
- Fork event: 1
Last Year
- Watch event: 3
- Fork event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 9
- Total pull requests: 1
- Average time to close issues: 15 days
- Average time to close pull requests: less than a minute
- Total issue authors: 6
- Total pull request authors: 1
- Average comments per issue: 2.11
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Abhinav43 (3)
- gmichalo (2)
- jatinvinkumar (1)
- drinkingxi (1)
- jiaminchen-1031 (1)
- MichalMalyska (1)
Pull Request Authors
- wren93 (1)