https://github.com/cosbidev/ostransformer

Official implementation for the paper ``A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values´´

https://github.com/cosbidev/ostransformer

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, sciencedirect.com
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Official implementation for the paper ``A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values´´

Basic Info
  • Host: GitHub
  • Owner: cosbidev
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 40 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed 9 months ago
Metadata Files
Readme

README.md

A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

arXiv

This document describes the implementation of ``A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values´´ in Pytorch. The proposed approach is an architecture specifically designed for survival analysis, with a focus on addressing missing values in clinical data without the need for any imputation strategy.

Our approach involves adapting the transformer's encoder architecture to tabular data, via a novel positional encoding for tabular features, and utilizing padding to mask any missing features within the attention module, enabling the model to ignore them effectively.

Usage

Here we provide a brief guide on how to use the code, comprising the label encoding, the model with the losses employed during training and the final metric to compute the performance.

The user can use a dataset of clinical features, but as an example we generate random data. ```python

DATA

nsamples = 100 nfeatures = 37 data = np.random.rand(nsamples, nfeatures) ```

To simulate a real case scenario, then we introduce some missing values in the data. python data[np.random.choice((0, 1), (n_samples, n_features), p=(0.8, 0.2)) == 1] = np.nan # Introduce missing values

We also generate some random label to be used for the survival analysis. In particular, we generate a label that is a tuple with the event and its respective time. ```python

LABELS

events = ("censored", "uncensored") numevents = len(events) - 1 # The first event is the censored one, so we do not consider it maxtime = 72 # Maximum time to consider for the survival analysis maxsurvival = 100 # Use this to generate labels, but then the analysis will consider only the time to maxtime, setting those patients who survived longer than maxtime to "censored" labels = np.hstack( ( np.random.choice( events, (nsamples, 1)), np.random.rand(nsamples, 1)*maxsurvival ), dtype=object ) ```

Given the labels, we can encode them in a format suitable for the survival analysis. More specifically, we encode the events with numbers starting from 0, which is related to the censored event, and we floor the time of the event, also setting those patients who survived longer than max_time to max_time. In particular, in the latter case, we set the event to 0, which is the censored event.

python survival_label_function = np.vectorize(lambda label: label_to_survival( label, events, max_time ), signature="(n)->(m)") survival_labels = survival_label_function(labels)

Afterward, we report some pieces of code to include in a training script and in the evaluation to test the model. We first define the parameters for the shared net, which is the transformer's encoder, and then the parameters for the cause specific subnets, which are the MLPs that will output the risk probabilities for each event in consideration, to instantiate the model.

```python

MODEL

OSTransformer (shared net)

embdim = nfeatures + 1 nheads = embdim // 2 sharednetparams = dict(embdim=embdim, numheads=nheads, outputsize=embdim)

CustomMLP (CS subnets)

hiddensizes = [400, 200] cssubnetparams = dict(hiddensizes=hidden_sizes)

model = SurvivalWrapper(numevents=numevents, maxtime=maxtime, sharednetparams=sharednetparams, cssubnetsparams=cssubnetparams) ```

Now we feed the data to the model and obtain the predictions, which are the risk probabilities for each event in consideration. Note that the predictions for the censored event are not considered since the event is not observed, and the time is not relevant for the analysis.

```python

OUTPUTS

Forward pass

outputs = model(torch.from_numpy(data).float())

Predictions

predictions = survival_prediction(outputs) ```

We can now compute the losses for the survival analysis, which are the survival log-likelihood loss and the survival ranking loss. ```python

Survival Losses

loss = 0 criterion1 = SurvivalLogLikelihoodLoss(numevents=numevents, maxtime=maxtime) loss += criterion1(outputs, torch.fromnumpy(survivallabels).float().unsqueeze(dim=1))

criterion2 = SurvivalRankingLoss(numevents=numevents, maxtime=maxtime) loss += criterion2(outputs, torch.fromnumpy(survivallabels).float().unsqueeze(dim=1)) ```

Finally, we can compute the performance of the model using the Ct-index, a time-dependent variant of the C-index. We perform the cumulative sum of the risk probabilities to obtain the cumulative incidence function (CIF) and then compute the Ct-index. ```python

Performance

outputs = torch.cumsum(outputs, dim=-1) # Compute the cumulative incidence function (CIF) cumulative summing the output probabilities performance = Ctindex(survivallabels, outputs.detach().numpy(), num_events) ```


Contact

For any questions, please contact camillomaria.caruso@unicampus.it and valerio.guarrasi@unicampus.it.


Citation

bibtex @article{CARUSO2024108308, title = {A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values}, journal = {Computer Methods and Programs in Biomedicine}, volume = {254}, pages = {108308}, year = {2024}, issn = {0169-2607}, doi = {https://doi.org/10.1016/j.cmpb.2024.108308}, url = {https://www.sciencedirect.com/science/article/pii/S016926072400302X}, author = {Camillo Maria Caruso and Valerio Guarrasi and Sara Ramella and Paolo Soda}, keywords = {Survival analysis, Missing data, Precision medicine, Oncology}, }

Owner

  • Name: CoSBi.dev
  • Login: cosbidev
  • Kind: organization
  • Location: Università Campus Bio-Medico di Roma

GitHub Events

Total
  • Push event: 2
  • Fork event: 1
Last Year
  • Push event: 2
  • Fork event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 4
  • Total Committers: 1
  • Avg Commits per committer: 4.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 3
  • Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
cm.caruso 1****o 4

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • numpy ==1.26.4
  • torch ==2.3.0