Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: c3armaanbutt
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 86.2 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 2
  • Open Issues: 1
  • Releases: 0
Created about 4 years ago · Last pushed about 4 years ago
Metadata Files
Readme Citation

README.md

CS 598: Deep Learning For Healthcare Final Project

By: Armaan R. Butt and Harikrishna Bojja {arbutt2, hbojja2}@illinois.edu

Group ID: 213, Paper ID: 283

Paper: Categorization of free-text drug orders using character-level recurrent neural networks [1]


Dependencies

Computer Specs

You will need a machine with the following specs:

  • CPU: 2.9 GHz - 8 Cores
  • Memory: 64 GB

Runtime

You will need a machine with Python 3.9.7 installed.

Python Libraries

Please have the following Python libraries installed. We have provided the requirements.txt file in the project root for your convenience.

keras==2.8.0 matplotlib==3.5.1 nltk==3.7 numpy==1.22.0 pandarallel==1.6.1 pandas==1.4.2 scikit_learn==1.0.2 tensorflow==2.8.0

Data Download Instructions

Please download the data from https://physionet.org/content/mimiciii-demo/1.4/ and extract the NOTEEVENTS.csv and PRESCRIPTIONS.csv to /data/real-mimic-iii-database.

Exploratory Data Analysis(EDA)

The jupyter notebooks below shows how we have used Seaborn and Matplotlib to analyze the Precriptions and Noteevents data.

  1. /src/data_profiling/profile_prescription_data.ipynb
  2. /src/data_profiling/profile_free_text.ipynb

Data Pre-Processing Code

Run the following jupyter notebooks in order. It takes approximately 1 hour to finish the data preprocessing.

  1. /src/data_processing/extract_drug_codes.ipynb
  2. /src/data_processing/note_events_processing.ipynb

Once complete it will generate two new files in /data/processed/:

  • NOTEEVENTS_ML_DATASET.csv
  • ndc_codes_extracted.csv

Train and Evaluate Models Code

To train and evaluate the SVM and GRU Models run the following jupyter notebooks:

  1. /src/ml/multi_class_svm.ipynb
  2. /src/ml/gru_model.ipynb

Results will be persisted in two csvs in the /data/results directory.

  • GRU_RESULTS.csv
  • SVM_results.csv

Results

Baseline SVM

The baseline model SVM model was trained on the top 22 common drugs in our dataset (NDC). The SVM model used a linear kernel (LinearSVC) with the input text data being vectorized at the character level using TfidVectoriezer using scikit-learn. TfidVectoriezer was configure to generate trigrams from the text data.

| NDC | Accuracy (%) | Precision (%) | Recall (%) | | ----------- | ------------ | ------------- | ---------- | | 00713016550 | 90 | 83 | 76 | | 00487950125 | 92 | 89 | 76 | | 00517391025 | 92 | 92 | 91 | | 51079001920 | 90 | 90 | 91 | | 11098003002 | 95 | 88 | 44 | | 00054829725 | 93 | 88 | 72 | | 00045025501 | 98 | 80 | 5 | | 00338055002 | 86 | 88 | 91 | | 00409131230 | 94 | 90 | 59 | | 00045152510 | 93 | 87 | 65 | | 00074407532 | 87 | 87 | 77 | | 51079080120 | 91 | 90 | 85 | | 51079025520 | 91 | 90 | 85 | | 00074176201 | 87 | 86 | 71 | | 00781305714 | 97 | 87 | 13 | | 00054465025 | 95 | 86 | 65 | | 00008084199 | 94 | 91 | 74 | | 58177000104 | 93 | 95 | 91 | | 00781188313 | 96 | 93 | 55 | | 00517293025 | 93 | 94 | 95 | | 00338355248 | 90 | 84 | 71 | | 00002735501 | 89 | 89 | 83 | | Average | 92 | 88 | 70 |

GRU - RNN

| Model | Hidden State Size | Number of Epochs | Mean Training Accuracy (%) | Mean Test Accuracy (%) | Mean Training Loss (%) | Mean Test Loss (%) | | ----------------- | ----------------- | ---------------- | -------------------------- | ---------------------- | ---------------------- | ------------------ | | Bidirectional GRU | 32 | 3 | 75.44 | 75.69 | 56.44 | 55.49 | | Bidirectional GRU | 64 | 3 | 75.45 | 75.69 | 56.22 | 55.52 | | Bidirectional GRU | 128 | 3 | 75.44 | 75.69 | 56.26 | 55.66 |

References

[1] Raiskin Y, Eickhoff C, Beeler PE. Categorization of free-text drug orders using character-level recurrent neural networks. Int J Med Inform. 2019 Sep;129:20-28. doi: 10.1016/j.ijmedinf.2019.05.020. Epub 2019 May 23. PMID: 31445256.

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • keras ==2.8.0
  • matplotlib ==3.5.1
  • nltk ==3.7
  • numpy ==1.22.0
  • pandarallel ==1.6.1
  • pandas ==1.4.2
  • scikit_learn ==1.0.2
  • tensorflow ==2.8.0