ordinal_gose_prediction
The leap to ordinal: functional prognosis after traumatic brain injury using artificial intelligence
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary
Repository
The leap to ordinal: functional prognosis after traumatic brain injury using artificial intelligence
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Baseline ordinal prediction of functional outcomes after traumatic brain injury (TBI) in the ICU
The leap to ordinal: Detailed functional prognosis after traumatic brain injury with a flexible modelling approach (https://doi.org/10.1371/journal.pone.0270973)
Contents
Overview
This repository contains the code underlying the article entitled The leap to ordinal: Detailed functional prognosis after traumatic brain injury with a flexible modelling approach from the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) consortium. In this file, we present the abstract, to outline the motivation for the work and the findings, and then a brief description of the code with which we generate these finding and achieve this objective.\ \ The code on this repository is commented throughout to provide a description of each step alongside the code which achieves it.
Abstract
After a traumatic brain injury (TBI), outcome prognosis within 24 hours of intensive care unit (ICU) admission is essential for baseline risk adjustment and shared decision making. TBI outcomes are commonly categorised by the Glasgow Outcome Scale – Extended (GOSE) into eight, ordered levels of functional recovery at 6 months after injury. Existing ICU prognostic models predict binary outcomes at a certain threshold of GOSE (e.g., functional independence [GOSE > 4]). We aimed to develop ordinal prediction models that concurrently predict probabilities of each GOSE score. From the ICU strata of the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) project dataset (65 centres), we extracted all baseline clinical information (1,151 predictors) and 6-month GOSE scores from a prospective cohort of 1,550 adult TBI patients. We analysed the effect of two design elements on ordinal model performance: (1) the baseline predictor set, ranging from a concise set of ten validated predictors to a token-embedded representation of all possible predictors, and (2) the modelling strategy, from ordinal logistic regression to multinomial deep learning. With repeated k-fold cross-validation, we found that expanding the baseline predictor set significantly improved ordinal prediction performance while increasing analytical complexity did not. Half of these gains could be achieved with the addition of eight high-impact predictors (2 demographic variables, 4 protein biomarkers, and 2 severity assessments) to the concise set. At best, ordinal models achieved 0.76 (95% CI: 0.74 – 0.77) ordinal discrimination ability (ordinal c-index) and 57% (95% CI: 54% – 60%) explanation of ordinal variation in 6-month GOSE (Somers’ Dxy). Model performance and the effect of expanding the predictor set decreased at higher GOSE thresholds, indicating the difficulty of predicting better functional outcomes shortly after ICU admission. Our results motivate the search for informative predictors that improve confidence in prognosis of higher GOSE and the development of ordinal dynamic prediction models.
Code
All of the code used in this work can be found in the ./scripts directory as Python (.py), R (.R), or bash (.sh) scripts. Moreover, custom classes have been saved in the ./scripts/classes sub-directory, custom functions have been saved in the ./scripts/functions sub-directory, and custom PyTorch models have been saved in the ./scripts/models sub-directory.
1. Extract study sample from CENTER-TBI dataset
In this .py file, we extract the study sample from the CENTER-TBI dataset, filter patients by our study criteria, and convert ICU timestamps to machine-readable format.
2. Partition CENTER-TBI for stratified, repeated k-fold cross-validation
In this .py file, we create 100 partitions, stratified by 6-month GOSE, for repeated k-fold cross-validation, and save the splits into a dataframe for subsequent scripts.
3. Prepare concise predictor set for ordinal prediction
In this .R file, we perform multiple imputation with chained equations (MICE, m = 100) on the concise predictor set for CPM training. The training set for each repeated k-fold CV partition is used to train an independent predictive mean matching imputation transformation for that partition. The result is 100 imputations, one for each repeated k-fold cross validation partition.
4. Train logistic regression concise-predictor-based models (CPM)
In this .py file, we define a function to train logistic regression CPMs given the repeated cross-validation dataframe. Then we perform parallelised training of logistic regression CPMs and testing set prediction. Finally, we compile testing set predictions.
5. Assess CPMMNLR and CPMPOLR performance
In this .py file, we create and save bootstrapping resamples used for all model performance evaluation. We prepare compiled CPMMNLR and CPMPOLR testing set predictions, and calculate/save performance metrics.
6. Train and optimise CPMDeepMN and CPMDeepOR
Train deep learning concise-predictor-based models (CPM)
In this.pyfile, we first create a grid of tuning configuration-cross-validation combinations and train CPM_DeepMN or CPM_DeepOR models based on provided hyperparameter row index. This is run, with multi-array indexing, on the HPC using a bash script.Perform interrepeat hyperparameter configuration dropout on deep learning concise-predictor-based models (CPM)
In this.pyfile, we calculate ORC of extant validation predictions, prepare bootstrapping resamples for configuration dropout, and dropout configurations that are consistently (ɑ = .05) inferior in performance.Calculate ORC in bootstrapping resamples to determine dropout configurations
In this.pyfile, we calculate ORC in each resample and compare to 'optimal' configuration. This is run, with multi-array indexing, on the HPC using a bash script.
7. Calculate and compile CPMDeepMN and CPMDeepOR metrics
Assess CPM_DeepMN and CPM_DeepOR performance
In this.pyfile, we calculate perfomance metrics on resamples. This is run, with multi-array indexing, on the HPC using a bash script.Compile CPM_DeepMN and CPM_DeepOR performance metrics and calculate confidence intervals
In this.pyfile, we compile all CPM_DeepMN and CPM_DeepOR performance metrics and calculate confidence intervals on all CPM performance metrics.
8. Prepare predictor tokens for the training of all-predictor-based models (APMs)
In this .R file, we load and prepare formatted CENTER-TBI predictor tokens. Then, convert formatted predictors to tokens for each repeated cross-validation partition.
9. Train APM dictionaries and convert tokens to embedding layer indices
In this .py file, we train APM dictionaries per repeated cross-validation partition and convert tokens to indices.
10. Train and optimise APMMN and APMOR
Train deep learning all-predictor-based models (APM)
In this.pyfile, we first create a grid of tuning configuration-cross-validation combinations and train APM_MN or APM_OR models based on provided hyperparameter row index. This is run, with multi-array indexing, on the HPC using a bash script.Perform interrepeat hyperparameter configuration dropout on deep learning all-predictor-based models (APM)
In this.pyfile, we calculate ORC of extant validation predictions, prepare bootstrapping resamples for configuration dropout, and dropout configurations that are consistently (ɑ = .05) inferior in performance.Calculate ORC in bootstrapping resamples to determine dropout configurations
In this.pyfile, we calculate ORC in each resample and compare to 'optimal' configuration. This is run, with multi-array indexing, on the HPC using a bash script.
11. Calculate and compile APMMN and APMOR metrics
Assess APM_MN and APM_OR performance
In this.pyfile, we calculate perfomance metrics on resamples. This is run, with multi-array indexing, on the HPC using a bash script.Compile APM_MN and APM_OR performance metrics and calculate confidence intervals
In this.pyfile, we compile all APM_MN and APM_OR performance metrics and calculate confidence intervals on all CPM performance metrics.
12. Assess feature significance in APM_MN
Calculate SHAP values for APM_MN
In this.pyfile, we find all top-performing model checkpoint files for SHAP calculation and calculate SHAP values based on given parameters. This is run, with multi-array indexing, on the HPC using a bash script.Compile SHAP values for each GUPI-output type combination from APM_MN
In this.pyfile, we find all files storing calculated SHAP values and create combinations with study GUPIs and compile SHAP values for the given GUPI and output type combination. This is run, with multi-array indexing, on the HPC using a bash script.Summarise SHAP values across study set
In this.pyfile, we find all files storing GUPI-specific SHAP values and compile and save summary SHAP values across study patient set.Summarise aggregation weights across trained APM set
In this.pyfile, we compile significance weights across trained APMs and summarise significance weights.
13. Prepare extended concise predictor set for ordinal prediction
In this .R file, we load IMPACT variables from CENTER-TBI, load and prepare the added variables from CENTER-TBI, and multiply impute extended concise predictor set in parallel. The training set for each repeated k-fold CV partition is used to train an independent predictive mean matching imputation transformation for that partition. The result is 100 imputations, one for each repeated k-fold cross validation partition.
14. Train logistic regression extended concise-predictor-based models (eCPM)
In this .py file, we define a function to train logistic regression eCPMs given the repeated cross-validation dataframe. Then we perform parallelised training of logistic regression eCPMs and testing set prediction. Finally, we compile testing set predictions.
15. Assess eCPMMNLR and eCPMPOLR performance
In this .py file, we load the common bootstrapping resamples (that will be used for all model performance evaluation), prepare compiled eCPMMNLR and eCPMPOLR testing set predictions, and calculate/save performance metrics
16. Train and optimise eCPMDeepMN and eCPMDeepOR
Train deep learning extended concise-predictor-based models (eCPM)
In this.pyfile, we first create a grid of tuning configuration-cross-validation combinations and train eCPM_DeepMN or eCPM_DeepOR models based on provided hyperparameter row index. This is run, with multi-array indexing, on the HPC using a bash script.Perform interrepeat hyperparameter configuration dropout on deep learning extended concise-predictor-based models (eCPM)
In this.pyfile, we calculate ORC of extant validation predictions, prepare bootstrapping resamples for configuration dropout, and dropout configurations that are consistently (ɑ = .05) inferior in performanceCalculate ORC in bootstrapping resamples to determine dropout configurations
In this.pyfile, we calculate ORC in each resample and compare to 'optimal' configuration. This is run, with multi-array indexing, on the HPC using a bash script.
17. Calculate and compile eCPMDeepMN and eCPMDeepOR metrics
Assess eCPM_DeepMN and eCPM_DeepOR performance
In this.pyfile, we calculate perfomance metrics on resamples. This is run, with multi-array indexing, on the HPC using a bash script.Compile eCPM_DeepMN and eCPM_DeepOR performance metrics and calculate confidence intervals
In this.pyfile, we compile all eCPM_DeepMN and eCPM_DeepOR performance metrics and calculate confidence intervals on all CPM performance metrics.
18. Perform ordinal regression analysis on study characteristics and predictors
In this .py file, we perform ordinal regression analysis on summary characteristics, perform ordinal regression analysis on CPM characteristics, and perform ordinal regression analysis on eCPM characteristics.
19. Visualise study results for manuscript
In this .R file, we produce the figures for the manuscript and the supplementary figures. The large majority of the quantitative figures in the manuscript are produced using the ggplot package.
Citation
@article{10.1371/journal.pone.0270973,
doi = {10.1371/journal.pone.0270973},
author = {Bhattacharyay, Shubhayu AND Milosevic, Ioan AND Wilson, Lindsay AND Menon, David K. AND Stevens, Robert D. AND Steyerberg, Ewout W. AND Nelson, David W. AND Ercole, Ari AND the CENTER-TBI investigators participants},
journal = {PLOS ONE},
publisher = {Public Library of Science},
title = {The leap to ordinal: Detailed functional prognosis after traumatic brain injury with a flexible modelling approach},
year = {2022},
month = {07},
volume = {17},
url = {https://doi.org/10.1371/journal.pone.0270973},
pages = {1-29},
abstract = {When a patient is admitted to the intensive care unit (ICU) after a traumatic brain injury (TBI), an early prognosis is essential for baseline risk adjustment and shared decision making. TBI outcomes are commonly categorised by the Glasgow Outcome Scale–Extended (GOSE) into eight, ordered levels of functional recovery at 6 months after injury. Existing ICU prognostic models predict binary outcomes at a certain threshold of GOSE (e.g., prediction of survival [GOSE > 1]). We aimed to develop ordinal prediction models that concurrently predict probabilities of each GOSE score. From a prospective cohort (n = 1,550, 65 centres) in the ICU stratum of the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) patient dataset, we extracted all clinical information within 24 hours of ICU admission (1,151 predictors) and 6-month GOSE scores. We analysed the effect of two design elements on ordinal model performance: (1) the baseline predictor set, ranging from a concise set of ten validated predictors to a token-embedded representation of all possible predictors, and (2) the modelling strategy, from ordinal logistic regression to multinomial deep learning. With repeated k-fold cross-validation, we found that expanding the baseline predictor set significantly improved ordinal prediction performance while increasing analytical complexity did not. Half of these gains could be achieved with the addition of eight high-impact predictors to the concise set. At best, ordinal models achieved 0.76 (95% CI: 0.74–0.77) ordinal discrimination ability (ordinal c-index) and 57% (95% CI: 54%– 60%) explanation of ordinal variation in 6-month GOSE (Somers’ Dxy). Model performance and the effect of expanding the predictor set decreased at higher GOSE thresholds, indicating the difficulty of predicting better functional outcomes shortly after ICU admission. Our results motivate the search for informative predictors that improve confidence in prognosis of higher GOSE and the development of ordinal dynamic prediction models.},
number = {7}
}
Owner
- Name: Shubhayu Bhattacharyay
- Login: sbhattacharyay
- Kind: user
- Location: uk
- Company: cambridge
- Twitter: shubhayu_neuro
- Repositories: 2
- Profile: https://github.com/sbhattacharyay
PhD candidate
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2