https://github.com/aehrc/laat

A Label Attention Model for ICD Coding from Clinical Text

https://github.com/aehrc/laat

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

clinical-text icd-coding laat label-attention-classification
Last synced: 10 months ago · JSON representation

Repository

A Label Attention Model for ICD Coding from Clinical Text

Basic Info
  • Host: GitHub
  • Owner: aehrc
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 107 MB
Statistics
  • Stars: 69
  • Watchers: 8
  • Forks: 22
  • Open Issues: 1
  • Releases: 0
Topics
clinical-text icd-coding laat label-attention-classification
Created almost 6 years ago · Last pushed almost 4 years ago
Metadata Files
Readme License

readme.md

A Label Attention Model for ICD Coding from Clinical Text Twitter

GitHub top language GitHub issues GitHub repo size GitHub last commit GitHub forks GitHub stars

This project provides the code for our JICAI 2020 A Label Attention Model for ICD Coding from Clinical Text paper.

The general architecture and experimental results can be found in our paper:

@inproceedings{ijcai2020-461-vu, title = {A Label Attention Model for ICD Coding from Clinical Text}, author = {Vu, Thanh and Nguyen, Dat Quoc and Nguyen, Anthony}, booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI-20}}, pages = {3335--3341}, year = {2020}, month = {7}, note = {Main track} doi = {10.24963/ijcai.2020/461}, url = {https://doi.org/10.24963/ijcai.2020/461}, }

Please CITE our paper when this code is used to produce published results or incorporated into other software.

Requirements

  • python>=3.6
  • torch==1.4.0
  • scikit-learn==0.23.1
  • numpy==1.16.3
  • scipy==1.2.1
  • pandas==0.24.2
  • tqdm==4.31.1
  • nltk>=3.4.5
  • psycopg2==2.7.7
  • gensim==3.6.0
  • transformers==2.11.0

Run pip install -r requirements.txt to install the required libraries

Run python3 and run import nltk and nltk.download('punkt') for tokenization

Data preparation

MIMIC-III-full and MIMIC-III-50 experiments

data/mimicdata/mimic3

  • The id files are from caml-mimic
  • Install the MIMIC-III database with PostgreSQL following this instruction
  • Generate the train/valid/test sets using src/util/mimiciii_data_processing.py. (Configure the connection to PostgreSQL at Line 139)

MIMIC-II-full experiment

data/mimicdata/mimic2

  • Place the MIMIC-II file (MIMICRAWDSUMS) to data/mimicdata/mimic2
  • Generate the train/valid/test sets using src/util/mimicii_data_processing.py.

Note that: The code will generate 3 files (train.csv, valid.csv, and test.csv) for each experiment.

Pretrained word embeddings

data/embeddings

We used gensim to train the embeddings (word2vec model) using the entire MIMIC-III discharge summary data.

Our code also supports subword embeddings (fastText) which helps produce better performances (see src/args_parser.py).

How to run

The problem and associated configurations are defined in configuration/config.json. Note that there are 3 files in each data folder (train.csv, valid.csv and test.csv)

There are common hyperparameters for all the models and the model-specific hyperparameters. See src/args_parser.py for more detail

Here is an example of using the framework on MIMIC-III dataset (full codes) with hierarchical join learning

python -m src.run \ --problem_name mimic-iii_2_full \ --max_seq_length 4000 \ --n_epoch 50 \ --patience 5 \ --batch_size 8 \ --optimiser adamw \ --lr 0.001 \ --dropout 0.3 \ --level_projection_size 128 \ --main_metric micro_f1 \ --embedding_mode word2vec \ --embedding_file data/embeddings/word2vec_sg0_100.model \ --attention_mode label \ --d_a 512 \ RNN \ --rnn_model LSTM \ --n_layers 1 \ --bidirectional 1 \ --hidden_size 512

Owner

  • Name: The Australian e-Health Research Centre
  • Login: aehrc
  • Kind: organization

The Australian e-Health Research Centre (AEHRC) is CSIRO’s digital health research program.

GitHub Events

Total
  • Watch event: 3
  • Fork event: 1
Last Year
  • Watch event: 3
  • Fork event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 9
  • Total pull requests: 1
  • Average time to close issues: 15 days
  • Average time to close pull requests: less than a minute
  • Total issue authors: 6
  • Total pull request authors: 1
  • Average comments per issue: 2.11
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Abhinav43 (3)
  • gmichalo (2)
  • jatinvinkumar (1)
  • drinkingxi (1)
  • jiaminchen-1031 (1)
  • MichalMalyska (1)
Pull Request Authors
  • wren93 (1)
Top Labels
Issue Labels
Pull Request Labels