dl4h_final_project
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: c3armaanbutt
- Language: Jupyter Notebook
- Default Branch: main
- Size: 86.2 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 2
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
CS 598: Deep Learning For Healthcare Final Project
By: Armaan R. Butt and Harikrishna Bojja {arbutt2, hbojja2}@illinois.edu
Group ID: 213, Paper ID: 283
Paper: Categorization of free-text drug orders using character-level recurrent neural networks [1]
Dependencies
Computer Specs
You will need a machine with the following specs:
- CPU: 2.9 GHz - 8 Cores
- Memory: 64 GB
Runtime
You will need a machine with Python 3.9.7 installed.
Python Libraries
Please have the following Python libraries installed. We have provided the requirements.txt file in the project root for your convenience.
keras==2.8.0
matplotlib==3.5.1
nltk==3.7
numpy==1.22.0
pandarallel==1.6.1
pandas==1.4.2
scikit_learn==1.0.2
tensorflow==2.8.0
Data Download Instructions
Please download the data from https://physionet.org/content/mimiciii-demo/1.4/ and extract the NOTEEVENTS.csv and PRESCRIPTIONS.csv to /data/real-mimic-iii-database.
Exploratory Data Analysis(EDA)
The jupyter notebooks below shows how we have used Seaborn and Matplotlib to analyze the Precriptions and Noteevents data.
/src/data_profiling/profile_prescription_data.ipynb/src/data_profiling/profile_free_text.ipynb
Data Pre-Processing Code
Run the following jupyter notebooks in order. It takes approximately 1 hour to finish the data preprocessing.
/src/data_processing/extract_drug_codes.ipynb/src/data_processing/note_events_processing.ipynb
Once complete it will generate two new files in /data/processed/:
NOTEEVENTS_ML_DATASET.csvndc_codes_extracted.csv
Train and Evaluate Models Code
To train and evaluate the SVM and GRU Models run the following jupyter notebooks:
/src/ml/multi_class_svm.ipynb/src/ml/gru_model.ipynb
Results will be persisted in two csvs in the /data/results directory.
GRU_RESULTS.csvSVM_results.csv
Results
Baseline SVM
The baseline model SVM model was trained on the top 22 common drugs in our dataset (NDC). The SVM model used a linear kernel (LinearSVC) with the input text data being vectorized at the character level using TfidVectoriezer using scikit-learn. TfidVectoriezer was configure to generate trigrams from the text data.
| NDC | Accuracy (%) | Precision (%) | Recall (%) | | ----------- | ------------ | ------------- | ---------- | | 00713016550 | 90 | 83 | 76 | | 00487950125 | 92 | 89 | 76 | | 00517391025 | 92 | 92 | 91 | | 51079001920 | 90 | 90 | 91 | | 11098003002 | 95 | 88 | 44 | | 00054829725 | 93 | 88 | 72 | | 00045025501 | 98 | 80 | 5 | | 00338055002 | 86 | 88 | 91 | | 00409131230 | 94 | 90 | 59 | | 00045152510 | 93 | 87 | 65 | | 00074407532 | 87 | 87 | 77 | | 51079080120 | 91 | 90 | 85 | | 51079025520 | 91 | 90 | 85 | | 00074176201 | 87 | 86 | 71 | | 00781305714 | 97 | 87 | 13 | | 00054465025 | 95 | 86 | 65 | | 00008084199 | 94 | 91 | 74 | | 58177000104 | 93 | 95 | 91 | | 00781188313 | 96 | 93 | 55 | | 00517293025 | 93 | 94 | 95 | | 00338355248 | 90 | 84 | 71 | | 00002735501 | 89 | 89 | 83 | | Average | 92 | 88 | 70 |
GRU - RNN
| Model | Hidden State Size | Number of Epochs | Mean Training Accuracy (%) | Mean Test Accuracy (%) | Mean Training Loss (%) | Mean Test Loss (%) | | ----------------- | ----------------- | ---------------- | -------------------------- | ---------------------- | ---------------------- | ------------------ | | Bidirectional GRU | 32 | 3 | 75.44 | 75.69 | 56.44 | 55.49 | | Bidirectional GRU | 64 | 3 | 75.45 | 75.69 | 56.22 | 55.52 | | Bidirectional GRU | 128 | 3 | 75.44 | 75.69 | 56.26 | 55.66 |
References
[1] Raiskin Y, Eickhoff C, Beeler PE. Categorization of free-text drug orders using character-level recurrent neural networks. Int J Med Inform. 2019 Sep;129:20-28. doi: 10.1016/j.ijmedinf.2019.05.020. Epub 2019 May 23. PMID: 31445256.
GitHub Events
Total
Last Year
Dependencies
- keras ==2.8.0
- matplotlib ==3.5.1
- nltk ==3.7
- numpy ==1.22.0
- pandarallel ==1.6.1
- pandas ==1.4.2
- scikit_learn ==1.0.2
- tensorflow ==2.8.0