https://github.com/cosbidev/ostransformer

Official implementation for the paper ``A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values´´

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, sciencedirect.com
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Official implementation for the paper ``A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values´´

Basic Info

Host: GitHub
Owner: cosbidev
Language: Python
Default Branch: main
Homepage:
Size: 40 KB

Statistics

Stars: 0
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 0

Created about 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme

A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

This document describes the implementation of ``A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values´´ in Pytorch. The proposed approach is an architecture specifically designed for survival analysis, with a focus on addressing missing values in clinical data without the need for any imputation strategy.

Our approach involves adapting the transformer's encoder architecture to tabular data, via a novel positional encoding for tabular features, and utilizing padding to mask any missing features within the attention module, enabling the model to ignore them effectively.

Usage

Here we provide a brief guide on how to use the code, comprising the label encoding, the model with the losses employed during training and the final metric to compute the performance.

The user can use a dataset of clinical features, but as an example we generate random data. ```python

DATA

nsamples = 100 nfeatures = 37 data = np.random.rand(nsamples, nfeatures) ```

To simulate a real case scenario, then we introduce some missing values in the data. python data[np.random.choice((0, 1), (n_samples, n_features), p=(0.8, 0.2)) == 1] = np.nan # Introduce missing values

We also generate some random label to be used for the survival analysis. In particular, we generate a label that is a tuple with the event and its respective time. ```python

LABELS

events = ("censored", "uncensored") numevents = len(events) - 1 # The first event is the censored one, so we do not consider it maxtime = 72 # Maximum time to consider for the survival analysis maxsurvival = 100 # Use this to generate labels, but then the analysis will consider only the time to maxtime, setting those patients who survived longer than maxtime to "censored" labels = np.hstack( ( np.random.choice( events, (nsamples, 1)), np.random.rand(nsamples, 1)*maxsurvival ), dtype=object ) ```

Given the labels, we can encode them in a format suitable for the survival analysis. More specifically, we encode the events with numbers starting from 0, which is related to the censored event, and we floor the time of the event, also setting those patients who survived longer than max_time to max_time. In particular, in the latter case, we set the event to 0, which is the censored event.

python survival_label_function = np.vectorize(lambda label: label_to_survival( label, events, max_time ), signature="(n)->(m)") survival_labels = survival_label_function(labels)

Afterward, we report some pieces of code to include in a training script and in the evaluation to test the model. We first define the parameters for the shared net, which is the transformer's encoder, and then the parameters for the cause specific subnets, which are the MLPs that will output the risk probabilities for each event in consideration, to instantiate the model.

```python

MODEL

OSTransformer (shared net)

embdim = nfeatures + 1 nheads = embdim // 2 sharednetparams = dict(embdim=embdim, numheads=nheads, outputsize=embdim)

CustomMLP (CS subnets)

hiddensizes = [400, 200] cssubnetparams = dict(hiddensizes=hidden_sizes)

model = SurvivalWrapper(numevents=numevents, maxtime=maxtime, sharednetparams=sharednetparams, cssubnetsparams=cssubnetparams) ```

Now we feed the data to the model and obtain the predictions, which are the risk probabilities for each event in consideration. Note that the predictions for the censored event are not considered since the event is not observed, and the time is not relevant for the analysis.

```python

OUTPUTS

Forward pass

outputs = model(torch.from_numpy(data).float())

Predictions

predictions = survival_prediction(outputs) ```

We can now compute the losses for the survival analysis, which are the survival log-likelihood loss and the survival ranking loss. ```python

Survival Losses

loss = 0 criterion1 = SurvivalLogLikelihoodLoss(numevents=numevents, maxtime=maxtime) loss += criterion1(outputs, torch.fromnumpy(survivallabels).float().unsqueeze(dim=1))

criterion2 = SurvivalRankingLoss(numevents=numevents, maxtime=maxtime) loss += criterion2(outputs, torch.fromnumpy(survivallabels).float().unsqueeze(dim=1)) ```

Finally, we can compute the performance of the model using the Ct-index, a time-dependent variant of the C-index. We perform the cumulative sum of the risk probabilities to obtain the cumulative incidence function (CIF) and then compute the Ct-index. ```python

Performance

outputs = torch.cumsum(outputs, dim=-1) # Compute the cumulative incidence function (CIF) cumulative summing the output probabilities performance = Ctindex(survivallabels, outputs.detach().numpy(), num_events) ```

Contact

For any questions, please contact camillomaria.caruso@unicampus.it and valerio.guarrasi@unicampus.it.

Citation

bibtex @article{CARUSO2024108308, title = {A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values}, journal = {Computer Methods and Programs in Biomedicine}, volume = {254}, pages = {108308}, year = {2024}, issn = {0169-2607}, doi = {https://doi.org/10.1016/j.cmpb.2024.108308}, url = {https://www.sciencedirect.com/science/article/pii/S016926072400302X}, author = {Camillo Maria Caruso and Valerio Guarrasi and Sara Ramella and Paolo Soda}, keywords = {Survival analysis, Missing data, Precision medicine, Oncology}, }

Owner

Name: CoSBi.dev
Login: cosbidev
Kind: organization
Location: Università Campus Bio-Medico di Roma

Website: https://www.unicampus.it/ricerca/unita-di-ricerca/sistemi-di-elaborazione-e-bioinformatica
Repositories: 3
Profile: https://github.com/cosbidev

GitHub Events

Total

Push event: 2
Fork event: 1

Last Year

Push event: 2
Fork event: 1

Committers

Last synced: 12 months ago

All Time

Total Commits: 4
Total Committers: 1
Avg Commits per committer: 4.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 3
Committers: 1
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
cm.caruso	1****o	4

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/cosbidev/ostransformer

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

Usage

DATA

LABELS

MODEL

OSTransformer (shared net)

CustomMLP (CS subnets)

OUTPUTS

Forward pass

Predictions

Survival Losses

Performance

Contact

Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies