https://github.com/cosbidev/ostransformer
Official implementation for the paper ``A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values´´
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, sciencedirect.com -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
Official implementation for the paper ``A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values´´
Statistics
- Stars: 0
- Watchers: 1
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values
This document describes the implementation of ``A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values´´ in Pytorch. The proposed approach is an architecture specifically designed for survival analysis, with a focus on addressing missing values in clinical data without the need for any imputation strategy.
Our approach involves adapting the transformer's encoder architecture to tabular data, via a novel positional encoding for tabular features, and utilizing padding to mask any missing features within the attention module, enabling the model to ignore them effectively.
Usage
Here we provide a brief guide on how to use the code, comprising the label encoding, the model with the losses employed during training and the final metric to compute the performance.
The user can use a dataset of clinical features, but as an example we generate random data. ```python
DATA
nsamples = 100 nfeatures = 37 data = np.random.rand(nsamples, nfeatures) ```
To simulate a real case scenario, then we introduce some missing values in the data.
python
data[np.random.choice((0, 1), (n_samples, n_features), p=(0.8, 0.2)) == 1] = np.nan # Introduce missing values
We also generate some random label to be used for the survival analysis. In particular, we generate a label that is a tuple with the event and its respective time. ```python
LABELS
events = ("censored", "uncensored") numevents = len(events) - 1 # The first event is the censored one, so we do not consider it maxtime = 72 # Maximum time to consider for the survival analysis maxsurvival = 100 # Use this to generate labels, but then the analysis will consider only the time to maxtime, setting those patients who survived longer than maxtime to "censored" labels = np.hstack( ( np.random.choice( events, (nsamples, 1)), np.random.rand(nsamples, 1)*maxsurvival ), dtype=object ) ```
Given the labels, we can encode them in a format suitable for the survival analysis.
More specifically, we encode the events with numbers starting from 0, which is related to the censored event, and we floor the time of the event, also setting those patients who survived longer than max_time to max_time.
In particular, in the latter case, we set the event to 0, which is the censored event.
python
survival_label_function = np.vectorize(lambda label: label_to_survival( label, events, max_time ), signature="(n)->(m)")
survival_labels = survival_label_function(labels)
Afterward, we report some pieces of code to include in a training script and in the evaluation to test the model. We first define the parameters for the shared net, which is the transformer's encoder, and then the parameters for the cause specific subnets, which are the MLPs that will output the risk probabilities for each event in consideration, to instantiate the model.
```python
MODEL
OSTransformer (shared net)
embdim = nfeatures + 1 nheads = embdim // 2 sharednetparams = dict(embdim=embdim, numheads=nheads, outputsize=embdim)
CustomMLP (CS subnets)
hiddensizes = [400, 200] cssubnetparams = dict(hiddensizes=hidden_sizes)
model = SurvivalWrapper(numevents=numevents, maxtime=maxtime, sharednetparams=sharednetparams, cssubnetsparams=cssubnetparams) ```
Now we feed the data to the model and obtain the predictions, which are the risk probabilities for each event in consideration.
Note that the predictions for the censored event are not considered since the event is not observed, and the time is not relevant for the analysis.
```python
OUTPUTS
Forward pass
outputs = model(torch.from_numpy(data).float())
Predictions
predictions = survival_prediction(outputs) ```
We can now compute the losses for the survival analysis, which are the survival log-likelihood loss and the survival ranking loss. ```python
Survival Losses
loss = 0 criterion1 = SurvivalLogLikelihoodLoss(numevents=numevents, maxtime=maxtime) loss += criterion1(outputs, torch.fromnumpy(survivallabels).float().unsqueeze(dim=1))
criterion2 = SurvivalRankingLoss(numevents=numevents, maxtime=maxtime) loss += criterion2(outputs, torch.fromnumpy(survivallabels).float().unsqueeze(dim=1)) ```
Finally, we can compute the performance of the model using the Ct-index, a time-dependent variant of the C-index. We perform the cumulative sum of the risk probabilities to obtain the cumulative incidence function (CIF) and then compute the Ct-index. ```python
Performance
outputs = torch.cumsum(outputs, dim=-1) # Compute the cumulative incidence function (CIF) cumulative summing the output probabilities performance = Ctindex(survivallabels, outputs.detach().numpy(), num_events) ```
Contact
For any questions, please contact camillomaria.caruso@unicampus.it and valerio.guarrasi@unicampus.it.
Citation
bibtex
@article{CARUSO2024108308,
title = {A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values},
journal = {Computer Methods and Programs in Biomedicine},
volume = {254},
pages = {108308},
year = {2024},
issn = {0169-2607},
doi = {https://doi.org/10.1016/j.cmpb.2024.108308},
url = {https://www.sciencedirect.com/science/article/pii/S016926072400302X},
author = {Camillo Maria Caruso and Valerio Guarrasi and Sara Ramella and Paolo Soda},
keywords = {Survival analysis, Missing data, Precision medicine, Oncology},
}
Owner
- Name: CoSBi.dev
- Login: cosbidev
- Kind: organization
- Location: Università Campus Bio-Medico di Roma
- Website: https://www.unicampus.it/ricerca/unita-di-ricerca/sistemi-di-elaborazione-e-bioinformatica
- Repositories: 3
- Profile: https://github.com/cosbidev
GitHub Events
Total
- Push event: 2
- Fork event: 1
Last Year
- Push event: 2
- Fork event: 1
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- numpy ==1.26.4
- torch ==2.3.0