dl4h_final_project

https://github.com/c3armaanbutt/dl4h_final_project

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: c3armaanbutt
Language: Jupyter Notebook
Default Branch: main
Size: 86.2 MB

Statistics

Stars: 0
Watchers: 2
Forks: 2
Open Issues: 1
Releases: 0

Created about 4 years ago · Last pushed about 4 years ago

Metadata Files

Readme Citation

CS 598: Deep Learning For Healthcare Final Project

By: Armaan R. Butt and Harikrishna Bojja {arbutt2, hbojja2}@illinois.edu

Group ID: 213, Paper ID: 283

Paper: Categorization of free-text drug orders using character-level recurrent neural networks [1]

Dependencies

Computer Specs

You will need a machine with the following specs:

CPU: 2.9 GHz - 8 Cores
Memory: 64 GB

Runtime

You will need a machine with Python 3.9.7 installed.

Python Libraries

Please have the following Python libraries installed. We have provided the requirements.txt file in the project root for your convenience.

keras==2.8.0 matplotlib==3.5.1 nltk==3.7 numpy==1.22.0 pandarallel==1.6.1 pandas==1.4.2 scikit_learn==1.0.2 tensorflow==2.8.0

Data Download Instructions

Please download the data from https://physionet.org/content/mimiciii-demo/1.4/ and extract the NOTEEVENTS.csv and PRESCRIPTIONS.csv to /data/real-mimic-iii-database.

Exploratory Data Analysis(EDA)

The jupyter notebooks below shows how we have used Seaborn and Matplotlib to analyze the Precriptions and Noteevents data.

/src/data_profiling/profile_prescription_data.ipynb
/src/data_profiling/profile_free_text.ipynb

Data Pre-Processing Code

Run the following jupyter notebooks in order. It takes approximately 1 hour to finish the data preprocessing.

/src/data_processing/extract_drug_codes.ipynb
/src/data_processing/note_events_processing.ipynb

Once complete it will generate two new files in /data/processed/:

NOTEEVENTS_ML_DATASET.csv
ndc_codes_extracted.csv

Train and Evaluate Models Code

To train and evaluate the SVM and GRU Models run the following jupyter notebooks:

/src/ml/multi_class_svm.ipynb
/src/ml/gru_model.ipynb

Results will be persisted in two csvs in the /data/results directory.

GRU_RESULTS.csv
SVM_results.csv

Results

Baseline SVM

The baseline model SVM model was trained on the top 22 common drugs in our dataset (NDC). The SVM model used a linear kernel (LinearSVC) with the input text data being vectorized at the character level using TfidVectoriezer using scikit-learn. TfidVectoriezer was configure to generate trigrams from the text data.

| NDC | Accuracy (%) | Precision (%) | Recall (%) | | ----------- | ------------ | ------------- | ---------- | | 00713016550 | 90 | 83 | 76 | | 00487950125 | 92 | 89 | 76 | | 00517391025 | 92 | 92 | 91 | | 51079001920 | 90 | 90 | 91 | | 11098003002 | 95 | 88 | 44 | | 00054829725 | 93 | 88 | 72 | | 00045025501 | 98 | 80 | 5 | | 00338055002 | 86 | 88 | 91 | | 00409131230 | 94 | 90 | 59 | | 00045152510 | 93 | 87 | 65 | | 00074407532 | 87 | 87 | 77 | | 51079080120 | 91 | 90 | 85 | | 51079025520 | 91 | 90 | 85 | | 00074176201 | 87 | 86 | 71 | | 00781305714 | 97 | 87 | 13 | | 00054465025 | 95 | 86 | 65 | | 00008084199 | 94 | 91 | 74 | | 58177000104 | 93 | 95 | 91 | | 00781188313 | 96 | 93 | 55 | | 00517293025 | 93 | 94 | 95 | | 00338355248 | 90 | 84 | 71 | | 00002735501 | 89 | 89 | 83 | | Average | 92 | 88 | 70 |

GRU - RNN

| Model | Hidden State Size | Number of Epochs | Mean Training Accuracy (%) | Mean Test Accuracy (%) | Mean Training Loss (%) | Mean Test Loss (%) | | ----------------- | ----------------- | ---------------- | -------------------------- | ---------------------- | ---------------------- | ------------------ | | Bidirectional GRU | 32 | 3 | 75.44 | 75.69 | 56.44 | 55.49 | | Bidirectional GRU | 64 | 3 | 75.45 | 75.69 | 56.22 | 55.52 | | Bidirectional GRU | 128 | 3 | 75.44 | 75.69 | 56.26 | 55.66 |

References

[1] Raiskin Y, Eickhoff C, Beeler PE. Categorization of free-text drug orders using character-level recurrent neural networks. Int J Med Inform. 2019 Sep;129:20-28. doi: 10.1016/j.ijmedinf.2019.05.020. Epub 2019 May 23. PMID: 31445256.

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

keras ==2.8.0
matplotlib ==3.5.1
nltk ==3.7
numpy ==1.22.0
pandarallel ==1.6.1
pandas ==1.4.2
scikit_learn ==1.0.2
tensorflow ==2.8.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

dl4h_final_project

Science Score: 39.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

CS 598: Deep Learning For Healthcare Final Project

Paper: Categorization of free-text drug orders using character-level recurrent neural networks [1]

Dependencies

Computer Specs

Runtime

Python Libraries

Data Download Instructions

Exploratory Data Analysis(EDA)

Data Pre-Processing Code

Train and Evaluate Models Code

Results

Baseline SVM

GRU - RNN

References

GitHub Events

Total

Last Year

Dependencies