https://github.com/datasig-ac-uk/sepsis_label_extraction

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: datasig-ac-uk
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 18.1 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License

Subtle Variation in Sepsis-III Definitions Influences Predictive Performance of Machine Learning

The early detection of sepsis is a key research priority to help facilitate timely intervention. Criteria used to identify the onset time of sepsis from health records vary, hindering comparison and progress in this field. We considered the effects of variations in sepsis onset definition on the predictive performance of three representative models (i.e. Light gradient boosting machine (LGBM), Long short term memory (LSTM) and Cox proportional-hazards models (CoxPHM)) for early sepsis detection.

This repository is the official implementation of the paper entitled "Subtle Variation of Sepsis-III Definitions Influences Predictive Performance of Machine Learning".

This repository contains code for the following parts in our experimental pipeline: 1. Extracting the sepsis labelling from the MIMIC-III data based on three sepsis criteria H1-3 and their variants (see src/database) 2. Training three types of models (i.e. LGBM, LSTM and CoxPHM) for the early sepsis prediction on the datasets produced in Step 1. (see src/models) 3. Evaluating each trained model using the test metrics (e.g. AUROC) and producing the visualization plots (see src/visualization)

Environment Setup

The code has been tested successfully using Python 3.7; thus we suggest using this version or a later version of Python. A typical process for installing the package dependencies involves creating a new Python virtual environment.

To install the required packages, run the following: console pip install -r requirements.txt

Finally, to prepare the environment for running the code, run the following: console source pythonpath.sh

Data Extraction Pipeline

To train and evaluate our models, we will change the relational format of the MIMIC-III database to a pivoted view which includes key demographic information, vital signs, and laboratory readings. We will also create tables for the possible sepsis onset times of each patient. We will subsequently output the pivoted data to comma-separated value (CSV) files, which serve as input for model training and evaluation.

Prior to running any of the data extraction commands, make sure to change to the src/database subdirectory: console cd src/database

Next, please follow the instructions in the data extraction README.md. (Depending on your preferred choice of installing PostgreSQL on your machine yourself or using a Docker container, please follow the relevant sections in the data extraction README.md.)

Model Training and Testing Pipeline

Feature Extraction

To generate the derived features mentioned in our paper, simply run the following: console python3 src/features/generate_features.py The preceding command will save features required for model training/tuning/evaluation to data/processed.

Model tuning/training/evaluation

Initiate model tuning, training and evaluation using the main.py script. This script takes four optional arguments: --model, --process, --n_cpus, and --n_gpus: console python3 src/models/main.py --model MODEL_NAME --step STEP_NAME --n_cpus N_CPUS --n_gpus N_GPUS where MODEL_NAME is either LGBM, LSTM, or CoxPHM and where STEP_NAME is either tune train, or eval. Furthermore, N_CPUS is the number of CPUs and N_GPUs is the number of GPUs.

For each of the three models (LGBM, LSTM, and CoxPHM), the required sequence of steps is tune, train, eval: 1. tune: For a given model, running the tuning step computes and saves optimal hyperparameters for subsequent training and evaluation. 2. train: The model is trained and saved to the model/ directory for subsequent evaluation. 3. eval: Evaluation involves generating numerical results and predictions, which are respectively saved to outputs/results and outputs/predictions.

Note: To run all three above steps in the required order for all three models on 1 CPU and on 1 GPU, simply run main.py without any arguments, i.e. console python3 src/models/main.py The full pipeline could takes several days to complete, you can also download our pretrained model and obtain the results directly by the following commands:

console bash pretrained_models.sh python3 src/models/main.py --model MODEL_NAME --step eval

Visualizations

To reproduce all the plots in the paper, after having run the model evaluation step run the following command:
console python3 src/visualization/main_plots.py

Owner

Name: DataSig
Login: datasig-ac-uk
Kind: organization

Website: https://datasig.web.ox.ac.uk/
Repositories: 3
Profile: https://github.com/datasig-ac-uk

A rough path between mathematics and data science

GitHub Events

Total

Push event: 2
Create event: 2

Last Year

Push event: 2
Create event: 2

Dependencies

src/database/docker/Dockerfile docker

postgres latest build

src/database/mimic-code/buildmimic/docker/Dockerfile docker

postgres latest build

poetry.lock pypi

aiosignal 1.3.2
attrs 25.1.0
autograd 1.7.0
autograd-gamma 0.5.0
beautifulsoup4 4.13.3
certifi 2025.1.31
charset-normalizer 3.4.1
click 8.1.8
colorama 0.4.6
contourpy 1.3.1
cycler 0.12.1
dill 0.3.9
filelock 3.17.0
fonttools 4.56.0
formulaic 1.1.1
frozenlist 1.5.0
fsspec 2025.2.0
gdown 5.2.0
idna 3.10
iisignature 0.24
interface-meta 1.2.5
interface-meta 1.3.0
jinja2 3.1.5
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kiwisolver 1.4.8
lifelines 0.30.0
lightgbm 4.6.0
markupsafe 3.0.2
matplotlib 3.10.0
matplotlib-venn 1.1.2
mpmath 1.3.0
msgpack 1.1.0
networkx 3.4.2
numpy 2.2.3
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
packaging 24.2
pandas 2.2.3
pillow 11.1.0
protobuf 5.29.3
pyparsing 3.2.1
pysocks 1.7.1
python-dateutil 2.9.0.post0
pytz 2025.1
pyyaml 6.0.2
ray 2.42.1
referencing 0.36.2
requests 2.32.3
rpds-py 0.23.1
scikit-learn 1.6.1
scipy 1.15.2
seaborn 0.13.2
setuptools 75.8.1
six 1.17.0
soupsieve 2.6
sympy 1.13.1
threadpoolctl 3.5.0
torch 2.6.0
tqdm 4.67.1
triton 3.2.0
typing-extensions 4.12.2
tzdata 2025.1
urllib3 2.3.0
wrapt 1.17.2

pyproject.toml pypi

dill (>=0.3.9,<0.4.0)
gdown (>=5.2.0,<6.0.0)
iisignature @ git+https://github.com/bottler/iisignature.git
joblib (>=1.4.2,<2.0.0)
lifelines (>=0.30.0,<0.31.0)
lightgbm (>=4.6.0,<5.0.0)
matplotlib (>=3.10.0,<4.0.0)
matplotlib-venn (>=1.1.2,<2.0.0)
numpy (>=2.2.3,<3.0.0)
pandas (>=2.2.3,<3.0.0)
pillow (>=11.1.0,<12.0.0)
ray (>=2.42.1,<3.0.0)
requests (>=2.32.3,<3.0.0)
scikit-learn (>=1.6.1,<2.0.0)
scipy (>=1.15.2,<2.0.0)
seaborn (>=0.13.2,<0.14.0)
torch (>=2.6.0,<3.0.0)

requirements.txt pypi

dill ==0.3.1.1
gdown ==3.13.0
iisignature ==0.24
joblib ==0.14.0
lifelines ==0.26.0
lightgbm ==2.3.1
matplotlib ==3.1.3
matplotlib_venn ==0.11.6
numpy ==1.17.4
pandas ==1.2.4
pillow ==8.3.1
ray ==0.8.6
requests ==2.26.0
scikit_learn ==0.23.2
scipy ==1.3.3
seaborn ==0.11.1
torch ==1.6.0

src/database/requirements.txt pypi

matplotlib ==3.4.2
numpy ==1.20.3
pandas ==1.2.4
psycopg2 ==2.8.6
scikit-learn ==0.24.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science