dl-prediction-brca-tiph

Implementation of the paper "Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology" by Tiago Gonçalves, Dagoberto Pulido-Arias, Julian Willett, Katharina V. Hoebel, Mason Cleveland, Syed Rakin Ahmed, Elizabeth Gerstner, Jayashree Kalpathy-Cramer, Jaime S. Cardoso, Christopher P. Bridge and Albert E. Kim.

https://github.com/qtim-lab/dl-prediction-brca-tiph

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, ncbi.nlm.nih.gov, nature.com
  • Committers with academic emails
    3 of 4 committers (75.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Implementation of the paper "Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology" by Tiago Gonçalves, Dagoberto Pulido-Arias, Julian Willett, Katharina V. Hoebel, Mason Cleveland, Syed Rakin Ahmed, Elizabeth Gerstner, Jayashree Kalpathy-Cramer, Jaime S. Cardoso, Christopher P. Bridge and Albert E. Kim.

Basic Info
  • Host: GitHub
  • Owner: QTIM-Lab
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 16.1 MB
Statistics
  • Stars: 5
  • Watchers: 4
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology

Implementation of the paper "Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology" by Tiago Gonçalves, Dagoberto Pulido-Arias, Julian Willett, Katharina V. Hoebel, Mason Cleveland, Syed Rakin Ahmed, Elizabeth Gerstner, Jayashree Kalpathy-Cramer, Jaime S. Cardoso, Christopher P. Bridge and Albert E. Kim.

Paper accepted at the First Workshop on Imageomics (Imageomics-AAAI-24) - Discovering Biological Knowledge from Images using AI, held as part of the 38th Annual AAAI Conference on Artificial Intelligence.

paper | poster

Abstract

The interactions between tumor cells and the tumor microenvironment (TME) dictate therapeutic efficacy of radiation and many systemic therapies in breast cancer. However, to date, there is not a widely available method to reproducibly measure tumor and immune phenotypes for each patient's tumor. Given this unmet clinical need, we applied multiple instance learning (MIL) algorithms to assess activity of ten biologically relevant pathways from the hematoxylin and eosin (H&E) slide of primary breast tumors. We employed different feature extraction approaches and state-of-the-art model architectures. Using binary classification, our models attained area under the receiver operating characteristic (AUROC) scores above 0.70 for nearly all gene expression pathways and on some cases, exceeded 0.80. Attention maps suggest that our trained models recognize biologically relevant spatial patterns of cell sub-populations from H&E. These efforts represent a first step towards developing computational H&E biomarkers that reflect facets of the TME and hold promise for augmenting precision oncology.

Data Availability

If you have any questions regarding data availability, we kindly ask you to contact Albert Kim (akim46@mgh.harvard.edu) or Christopher Bridge (cbridge@mgh.harvard.edu).

Data Preparation

Preprocessing

HistoQC Analysis

We start by running the HistoQC package to obtain a list of the good-quality WSIs.

To run the HistoQC package on the data, you can run the following script: ```bash

!/bin/bash

echo 'Started HistoQC on TCGA-BRCA Database.'

cd code/preprocessing/histoqc python runhistoqcanalysis.py --database 'TCGABRCAData' --basedatapath 'data/TCGA-BRCA' --experimental_strategy 'All' --outdir 'results/HistoQC/TCGA-BRCA/mmxbrcp'

echo 'Finished HistoQC on TCGA-BRCA Database.' ```

To generate the .CSV with the list of good-quality WSIs, you can run the following script: ```bash

!/bin/bash

echo 'Started HistoQC Quality File on TCGA-BRCA Database.'

python code/preprocessing/histoqc/generatequalityfilelist.py --database 'TCGABRCAData' --histoqcresults_path 'results/HistoQC/TCGA-BRCA/mmxbrcp'

echo 'Finished HistoQC Quality File on TCGA-BRCA Database.' ```

Patch (and Segmentation Mask) Creation using CLAM

Then, we obtain the patches (and segmentation masks) from the WSIs.

To obtain the patches of the good-quality WSIs, using the CLAM framework, you can run the following script: ```bash

!/bin/bash

echo 'Started CLAM (createpatchesfp.py) on TCGA-BRCA Database.'

Using CLAM + HistoQC Segmentation and Patch Pipeline

python code/preprocessing/patchandsegmentation/createpatchesfp.py --sourcedir 'data/TCGA-BRCA' --datasetname 'TCGA-BRCA' --tcgabrcaexpstr 'DiagnosticSlide' --savedir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC' --patchsize 256 --preset 'code/preprocessing/patchandsegmentation/presets/tcga.csv' --patch --stitch --usehistoqcqualityfile 'results/HistoQC/TCGA-BRCA/mmxbrcp/hqcqualityfile.csv' --usehistoqcseg_masks --verbose

python code/preprocessing/patchandsegmentation/createpatchesfp.py --sourcedir 'data/TCGA-BRCA' --datasetname 'TCGA-BRCA' --tcgabrcaexpstr 'TissueSlide' --savedir 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC' --patchsize 256 --preset 'code/preprocessing/patchandsegmentation/presets/tcga.csv' --patch --stitch --usehistoqcqualityfile 'results/HistoQC/TCGA-BRCA/mmxbrcp/hqcqualityfile.csv' --usehistoqcseg_masks --verbose

echo 'Finished CLAM (createpatchesfp.py) on TCGA-BRCA Database.' ```

On the other hand, you can also ignore the HistoQC results and run the following script: ```bash

!/bin/bash

echo 'Started CLAM (createpatchesfp.py) on TCGA-BRCA Database.'

Using CLAM: Segmentation and Patch Pipeline

python code/preprocessing/patchandsegmentation/createpatchesfp.py --sourcedir 'data/TCGA-BRCA' --datasetname 'TCGA-BRCA' --tcgabrcaexpstr 'DiagnosticSlide' --savedir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationCLAM' --patchsize 256 --preset 'code/preprocessing/patchandsegmentation/presets/tcga.csv' --seg --savemask --patch --stitch --verbose

python code/preprocessing/patchandsegmentation/createpatchesfp.py --sourcedir 'data/TCGA-BRCA' --datasetname 'TCGA-BRCA' --tcgabrcaexpstr 'TissueSlide' --savedir 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationCLAM' --patchsize 256 --preset 'code/preprocessing/patchandsegmentation/presets/tcga.csv' --seg --savemask --patch --stitch --verbose

echo 'Finished CLAM (createpatchesfp.py) on TCGA-BRCA Database.' ```

Feature Extraction

Next, we can proceed into feature extraction.

Using CLAM

To perform feature extraction using CLAM, you can run the following script: ```bash

!/bin/bash

echo 'Started CLAM (extractfeaturesclam.py) on TCGA-BRCA Database.'

python code/featureextraction/extractfeaturesclam.py --gpuid 0 --datah5dir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC' --processlistcsvpath 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/processlistautogen.csv' --featdir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features/clam' --batchsize 512 --numworkers 10 --pin_memory --verbose

python code/featureextraction/extractfeaturesclam.py --gpuid 0 --datah5dir 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC' --processlistcsvpath 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/processlistautogen.csv' --featdir 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features/clam' --batchsize 512 --numworkers 10 --pin_memory --verbose

echo 'Finished CLAM (extractfeaturesfp.py) on TCGA-BRCA Database.'

```

Using PLIP

To perform feature extraction using PLIP, you can run the following script: ```bash

!/bin/bash

echo 'Started feature extraction using PLIP on TCGA-BRCA Database.'

python code/featureextraction/extractfeaturesplip.py --gpuid 0 --datah5dir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC' --processlistcsvpath 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/processlistautogen.csv' --featdir 'results/PLIP/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features/plip' --batchsize 4096 --numworkers 12 --pin_memory --verbose

python code/featureextraction/extractfeaturesplip.py --gpuid 0 --datah5dir 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC' --processlistcsvpath 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/processlistautogen.csv' --featdir 'results/PLIP/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features/plip' --batchsize 4096 --numworkers 12 --pin_memory --verbose

echo 'Finished feature extraction using PLIP on TCGA-BRCA Database.' ```

Models

Training

Finally, we can move forward to model training.

Using CLAM framework

To train the AM-SB/AM-MB models, you can run the following script: ```bash

!/bin/bash

echo 'Started CLAM (trainmodelfp.py) on TCGA-BRCA Database.'

List of labels for this project

LABELS=('hallmarkangiogenesis'\ 'hallmarkepithelialmesenchymaltransition'\ 'hallmarkfattyacidmetabolism'\ 'hallmarkoxidativephosphorylation'\ 'hallmarkglycolysis'\ 'keggantigenprocessingandpresentation'\ 'gobptcellmediatedcytotoxicity'\ 'gobpbcellproliferation'\ 'keggcell_cycle'\ 'immunosuppression')

for label in "${LABELS[@]}" do echo "Started CLAM Training for label: $label"

# CLAM (ResNet50) Features
python code/models/clam/train_val_model_fp.py --gpu_id 0 --results_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints' --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features' --label $label --config_json 'code/models/clam/config/tcgabrca_clam_fts_am_sb_config.json'

python code/models/clam/train_val_model_fp.py --gpu_id 0 --results_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints' --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features' --label $label --config_json 'code/models/clam/config/tcgabrca_clam_fts_am_mb_config.json'

# PLIP Features
python code/models/clam/train_val_model_fp.py --gpu_id 0 --results_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints' --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/PLIP/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/PLIP/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features' --label $label --config_json 'code/models/clam/config/tcgabrca_plip_fts_am_sb_config.json'

python code/models/clam/train_val_model_fp.py --gpu_id 0 --results_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints' --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/PLIP/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/PLIP/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features' --label $label --config_json 'code/models/clam/config/tcgabrca_plip_fts_am_mb_config.json'

echo "Finished CLAM Training for label: $label"

done

echo 'Finished CLAM Training on TCGA-BRCA Database.'

```

Using TransMIL

To train the TransMIL models, you can run the following script: ```bash

!/bin/bash

echo 'Started TransMIL Training on TCGA-BRCA Database.'

List of labels for this project

LABELS=('hallmarkangiogenesis'\ 'hallmarkepithelialmesenchymaltransition'\ 'hallmarkfattyacidmetabolism'\ 'hallmarkoxidativephosphorylation'\ 'hallmarkglycolysis'\ 'keggantigenprocessingandpresentation'\ 'gobptcellmediatedcytotoxicity'\ 'gobpbcellproliferation'\ 'keggcell_cycle'\ 'immunosuppression')

for label in "${LABELS[@]}" do echo "Started TransMIL Training for label: $label"

# CLAM (ResNet50) Features
python code/models/transmil/train_test_model_fp.py --gpu_id 0 --results_dir 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints' --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/PLIP/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/PLIP/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features' --label $label --config_json 'code/models/transmil/config/tcgabrca_clam_fts_transmil_config.json' --train_or_test 'train'

# PLIP Features
python code/models/transmil/train_test_model_fp.py --gpu_id 0 --results_dir 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints' --dataset 'TCGA-BRCA'--base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/PLIP/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/PLIP/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features' --label $label --config_json 'code/models/transmil/config/tcgabrca_plip_fts_transmil_config.json' --train_or_test 'train'

echo "Finished TransMIL Training for label: $label"

done

echo 'Finished TransMIL Training on TCGA-BRCA Database.' ```

Testing

Afterward, we can move forward to model testing.

Using CLAM framework

To test the AM-SB/AM-MB models, you can run the following script: ```bash

!/bin/bash

echo 'Started CLAM Testing on TCGA-BRCA Database.'

List of checkpoint directories for AM_SB (CLAM/ResNet50 Features)

CHECKPOINTDIRS=('results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkangiogenesis/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkepithelialmesenchymaltransition/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkfattyacidmetabolism/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkoxidativephosphorylation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkglycolysis/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggantigenprocessingandpresentation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobptcellmediatedcytotoxicity/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobpbcellproliferation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggcellcycle/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/immunosuppression/YYYY-MM-DDhh-mm-ss')

for checkpointdir in "${CHECKPOINTDIRS[@]}" do echo "Started CLAM Testing for checkpoint: $checkpoint_dir"

# CLAM/ResNet50 Features
python code/models/clam/test_model_fp.py --gpu_id 0 --checkpoint_dir $checkpoint_dir --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features'

echo "Finished CLAM Testing for checkpoint: $checkpoint_dir"

done

List of checkpoint directories for AM-MB (CLAM/ResNet50 Features)

CHECKPOINTDIRS=('results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobpbcellproliferation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobptcellmediatedcytotoxicity/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkangiogenesis/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkepithelialmesenchymaltransition/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkfattyacidmetabolism/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkglycolysis/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkoxidativephosphorylation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/immunosuppression/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggantigenprocessingandpresentation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggcellcycle/YYYY-MM-DDhh-mm-ss')

for checkpointdir in "${CHECKPOINTDIRS[@]}" do echo "Started CLAM Testing for checkpoint: $checkpoint_dir"

# CLAM/ResNet50 Features
python code/models/clam/test_model_fp.py --gpu_id 0 --checkpoint_dir $checkpoint_dir --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features'

echo "Finished CLAM Testing for checkpoint: $checkpoint_dir"

done

List of checkpoint directories for AM-SB and AM-MB (PLIP Features)

CHECKPOINTDIRS=('results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobpbcellproliferation/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobpbcellproliferation/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobptcellmediatedcytotoxicity/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobptcellmediatedcytotoxicity/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkangiogenesis/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkangiogenesis/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkepithelialmesenchymaltransition/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkepithelialmesenchymaltransition/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkfattyacidmetabolism/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkfattyacidmetabolism/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkglycolysis/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkglycolysis/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkoxidativephosphorylation/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkoxidativephosphorylation/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/immunosuppression/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/immunosuppression/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggantigenprocessingandpresentation/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggantigenprocessingandpresentation/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggcellcycle/YYYY-MM-DDhh-mm-ss' \ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggcellcycle/YYYY-MM-DD_hh-mm-ss')

for checkpointdir in "${CHECKPOINTDIRS[@]}" do echo "Started CLAM Testing for checkpoint: $checkpoint_dir"

# PLIP Features
python code/models/clam/test_model_fp.py --gpu_id 0 --checkpoint_dir $checkpoint_dir --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/PLIP/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/PLIP/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features'

echo "Finished CLAM Testing for checkpoint: $checkpoint_dir"

done

echo 'Finished CLAM Testing on TCGA-BRCA Database.' ```

Using TransMIL

To test the TransMIL models, you can run the following script: ```bash

!/bin/bash

echo 'Started TransMIL Testing on TCGA-BRCA Database.'

List of checkpoint directories for this project (CLAM/ResNet50-based features)

CHECKPOINTDIRS=('results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/gobpbcellproliferation/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/gobptcellmediatedcytotoxicity/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkangiogenesis/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkepithelialmesenchymaltransition/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkfattyacidmetabolism/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkglycolysis/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkoxidativephosphorylation/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/immunosuppression/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/keggantigenprocessingandpresentation/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/keggcellcycle/YYYY-MM-DDhh-mm-ss')

for checkpointdir in "${CHECKPOINTDIRS[@]}" do echo "Started TransMIL Testing for checkpoint directory: $checkpoint_dir"

# CLAM/ResNet50-based features
python code/models/transmil/train_test_model_fp.py --gpu_id 0 --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_pt_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features/clam/pt_files' 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features/clam/pt_files' --checkpoint_dir $checkpoint_dir --train_or_test 'test'

echo "Finished TransMIL Testing for checkpoint directory: $checkpoint_dir"

done

# List of checkpoint directories for this project (PLIP-based features) CHECKPOINTDIRS=('results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/gobpbcellproliferation/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/gobptcellmediatedcytotoxicity/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkangiogenesis/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkepithelialmesenchymaltransition/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkfattyacidmetabolism/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkglycolysis/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkoxidativephosphorylation/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/immunosuppression/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/keggantigenprocessingandpresentation/YYYY-MM-DDhh-mm-ss'\ 'results/TransMIL/TCGA-BRCA/mmxbrcp/All/checkpoints/keggcellcycle/YYYY-MM-DDhh-mm-ss')

for checkpointdir in "${CHECKPOINTDIRS[@]}" do echo "Started TransMIL Testing for checkpoint directory: $checkpoint_dir"

# PLIP-based features
python code/models/transmil/train_test_model_fp.py --gpu_id 0 --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/PLIP/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/PLIP/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features' --checkpoint_dir $checkpoint_dir --train_or_test 'test'

echo "Finished TransMIL Testing for checkpoint directory: $checkpoint_dir"

done

echo 'Finished TransMIL Testing on TCGA-BRCA Database.' ```

Heatmap Generation

After training the models, we can proceed to the heatmap generation, to understand their behavior.

Using CLAM framework and CLAM/ResNet50 features

To generate heatmaps for the CLAM/ResNet50 features, using the CLAM framework, you can run the following script: ```bash

!/bin/bash

echo 'Started CLAM (createheatmapsfp.py) on TCGA-BRCA Database.'

List of checkpoint directories for AM-SB and AM-MB (CLAM/ResNet50 Features)

CHECKPOINTDIRS=('results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkangiogenesis/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkepithelialmesenchymaltransition/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkfattyacidmetabolism/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkoxidativephosphorylation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkglycolysis/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggantigenprocessingandpresentation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobptcellmediatedcytotoxicity/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobpbcellproliferation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggcellcycle/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/immunosuppression/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobpbcellproliferation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/gobptcellmediatedcytotoxicity/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkangiogenesis/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkepithelialmesenchymaltransition/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkfattyacidmetabolism/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkglycolysis/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/hallmarkoxidativephosphorylation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/immunosuppression/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggantigenprocessingandpresentation/YYYY-MM-DDhh-mm-ss'\ 'results/CLAM/TCGA-BRCA/mmxbrcp/All/checkpoints/keggcellcycle/YYYY-MM-DD_hh-mm-ss')

for checkpointdir in "${CHECKPOINTDIRS[@]}" do echo "Started CLAM Heatmap Generation for checkpoint: $checkpoint_dir"

# CLAM Features
python code/models/clam/create_heatmaps_fp.py --gpu_id 0 --checkpoint_dir $checkpoint_dir --dataset 'TCGA-BRCA' --base_data_path 'data/TCGA-BRCA' --experimental_strategy 'All' --features_h5_dir 'results/CLAM/TCGA-BRCA/mmxbrcp/DiagnosticSlide/SegmentationHistoQC/features' 'results/CLAM/TCGA-BRCA/mmxbrcp/TissueSlide/SegmentationHistoQC/features' --generate_heatmaps_for 'test' --heatmap_config_file 'code/models/clam/config/tcgabrca_clam_fts_am_sb_heatmap_config.json' --use_histoqc_quality_file 'results/HistoQC/TCGA-BRCA/mmxbrcp/hqc_quality_file.csv' --use_histoqc_seg_masks --verbose

echo "Finished CLAM Heatmap Generation for checkpoint: $checkpoint_dir"

done

echo 'Finished CLAM Heatmap Generation on TCGA-BRCA Database.' ```

Using CLAM framework and PLIP features

WORK IN PROGRESS

Using TransMIL and CLAM/ResNet50 features

WORK IN PROGRESS

Using TransMIL and PLIP features

WORK IN PROGRESS

Credits and Acknowledgments

HistoQC

This framework is related to the papers "HistoQC: An Open-Source Quality Control Tool for Digital Pathology Slides" by Janowczyk A., Zuo R., Gilmore H., Feldman M. and Madabhushi A., and "Assessment of a computerized quantitative quality control tool for kidney whole slide image biopsies", Chen Y., Zee J., Smith A., Jayapandian C., Hodgin J., Howell D., Palmer M., Thomas D., Cassol C., Farris A., Perkinson K., Madabhushi A., Barisoni L. and Janowczyk A..

CLAM

This model and associated code are related to the paper "Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images" by Ming Y. Lu, Drew F. K. Williamson, Tiffany Y. Chen, Richard J. Chen, Matteo Barbieri and Faisal Mahmood.

TransMIL

This model and associated code are related to the paper "Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification" by Zhuchen Shao, Hao Bian, Yang Chen, Yifeng Wang, Jian Zhang, Xiangyang Ji and Yongbing Zhang.

Citation

If you use this repository in your research work, please cite this paper: bibtex @misc{gonçalves2024deep, title={{Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology}}, author={Tiago Gonçalves and Dagoberto Pulido-Arias and Julian Willett and Katharina V. Hoebel and Mason Cleveland and Syed Rakin Ahmed and Elizabeth Gerstner and Jayashree Kalpathy-Cramer and Jaime S. Cardoso and Christopher P. Bridge and Albert E. Kim}, year={2024}, eprint={2404.16397}, archivePrefix={{arXiv}}, primaryClass={{eess.IV}} }

Owner

  • Name: QTIM Lab
  • Login: QTIM-Lab
  • Kind: organization
  • Email: qtimlab@gmail.com
  • Location: Boston, MA

The Quantitative Translational Imaging in Medicine Lab at the Martinos Center (MGH/HST)

Citation (CITATION)

@misc{gonçalves2024deep,
      title={{Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology}}, 
      author={Tiago Gonçalves and Dagoberto Pulido-Arias and Julian Willett and Katharina V. Hoebel and Mason Cleveland and Syed Rakin Ahmed and Elizabeth Gerstner and Jayashree Kalpathy-Cramer and Jaime S. Cardoso and Christopher P. Bridge and Albert E. Kim},
      year={2024},
      eprint={2404.16397},
      archivePrefix={{arXiv}},
      primaryClass={{eess.IV}}
}

GitHub Events

Total
  • Watch event: 2
  • Push event: 1
  • Pull request event: 1
Last Year
  • Watch event: 2
  • Push event: 1
  • Pull request event: 1

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 294
  • Total Committers: 4
  • Avg Commits per committer: 73.5
  • Development Distribution Score (DDS): 0.058
Past Year
  • Commits: 235
  • Committers: 4
  • Avg Commits per committer: 58.75
  • Development Distribution Score (DDS): 0.072
Top Committers
Name Email Commits
TiagoFilipeSousaGoncalves t****s@h****m 277
Tiago Filipe Sousa Goncalves t****0@b****u 12
Tiago Filipe Sousa Goncalves t****0@g****u 3
Dagoberto Pulido Arias d****o@m****u 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • TiagoFilipeSousaGoncalves (2)
Top Labels
Issue Labels
Pull Request Labels