pnet_robustness
Reliable interpretability of biology-inspired deep neural networks
Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: nature.com, zenodo.org -
○Academic email domains
-
✓Institutional organization owner
Organization csbg has institutional domain (www.plus.ac.at) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Keywords
Repository
Reliable interpretability of biology-inspired deep neural networks
Basic Info
Statistics
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
Reliable interpretability of biology-inspired deep neural networks
This code supplements the publication by Esser-Skala and Fortelny (2023).
Folders
(Not all of these folders are included in the git repository.)
data: output files generated by P-NET and DTox (available from https://doi.org/10.5281/zenodo.7760561)doc: project documentationdocker: files for creating Docker containers with P-NET or DTox installedliterature: relevant publicationsplots: generated plotspnet_data: P-NET data filesrenv: R environment datascripts: bash and R scripts
Preparation
P-NET and DTox
Download the provided Docker containers from the GitHub Container registry:
bash
docker pull ghcr.io/csbg/pnet-container:1.0.0
docker pull ghcr.io/csbg/dtox-container:1.0.0
Alternatively, pull these containers with Apptainer/Singularity:
bash
singularity pull docker://ghcr.io/csbg/pnet-container:1.0.0
singularity pull docker://ghcr.io/csbg/dtox-container:1.0.0
When using the latter container format, replace all calls to run_[pnet/dtox]_docker.sh with run_[pnet/dtox]_singularity.sh.
R
In order to run the R scripts, you will need one of the following:
an installation of R 4.3.1; restore required packages from
renv.lockviabash Rscript -e "renv::restore()"`the Docker container available from the GitHub Container registry:
bash docker pull ghcr.io/csbg/r_pnet_robustness:1.0.0Replace calls toRscriptbelow byscripts/run_rscript_docker.sh.the Apptainer/Singularity container:
bash singularity pull docker://ghcr.io/csbg/r_pnet_robustness:1.0.0Replace calls toRscriptbelow byscripts/run_rscript_singularity.sh.
Datasets
Download the MSK-IMPACT 2017 dataset:
bash
wget https://cbioportal-datahub.s3.amazonaws.com/msk_impact_2017.tar.gz
tar xzf msk_impact_2017.tar.gz -C pnet_data
Run P-NET experiments
Generally, each experiment comprises the following steps:
- Load P-NET input data via
load_data_[dataset].R. - Optionally, modify input data via
modify_data_[technique].R. - Run P-NET via Docker using the provided bash script
run_pnet_docker.sh. This script has three arguments:-e experiment: experiment name, required-l [n]: lower seed, optional (default: -1, which uses the original seeds)-u [n]: upper seed, optional (default: 49)
Within each experiment, results from each run are saved in a subfolder indicating the two random seeds used (e.g., data/pnet_original/0_0).
utils.R is required by all data preparation scripts.
Original setup
Run P-NET with the original setup as described in the publication.
bash
Rscript scripts/load_data_original.R
scripts/run_pnet_docker.sh -e pnet_original
Deterministic inputs
Input data is modified so that presence of mutation and copy number amplification is perfectly correlated with class label 1 (copy number deletion is always 0).
bash
Rscript scripts/load_data_original.R
Rscript scripts/modify_data_deterministic.R
scripts/run_pnet_docker.sh -e pnet_deterministic
Shuffled labels
Shuffle training/test labels before each run using uniform class frequencies.
bash
for seed in {-1..49}; do
Rscript scripts/load_data_original.R
Rscript scripts/modify_data_shuffled.R FALSE $seed
scripts/run_pnet_docker.sh -e pnet_shuffled_each -l $seed -u $seed
done
MSK-IMPACT 2017 dataset
```bash Rscript scripts/loaddatamskimpact.R "Non-Small Cell Lung Cancer" scripts/runpnetdocker.sh -e mskimpactnsclcoriginal
for seed in {-1..49}; do Rscript scripts/loaddatamskimpact.R "Non-Small Cell Lung Cancer" Rscript scripts/modifydatashuffled.R FALSE $seed scripts/runpnetdocker.sh -e mskimpactnsclcshuffled -l $seed -u $seed done
Rscript scripts/loaddatamskimpact.R "Breast Cancer" scripts/runpnetdocker.sh -e mskimpactbcoriginal
for seed in {-1..49}; do Rscript scripts/loaddatamskimpact.R "Breast Cancer" Rscript scripts/modifydatashuffled.R FALSE $seed scripts/runpnetdocker.sh -e mskimpactbcshuffled -l $seed -u $seed done
Rscript scripts/loaddatamskimpact.R "Colorectal Cancer" scripts/runpnetdocker.sh -e mskimpactccoriginal
for seed in {-1..49}; do Rscript scripts/loaddatamskimpact.R "Colorectal Cancer" Rscript scripts/modifydatashuffled.R FALSE $seed scripts/runpnetdocker.sh -e mskimpactccshuffled -l $seed -u $seed done
Rscript scripts/loaddatamskimpact.R "Prostate Cancer" scripts/runpnetdocker.sh -e mskimpactpcoriginal
for seed in {-1..49}; do Rscript scripts/loaddatamskimpact.R "Prostate Cancer" Rscript scripts/modifydatashuffled.R FALSE $seed scripts/runpnetdocker.sh -e mskimpactpcshuffled -l $seed -u $seed done ```
Run DTox experiments
Run DTox with seeds ranging from 0 (i.e., the original seed) to 50.
bash
scripts/run_dtox_docker.sh
Results from each run are saved in a subfolder indicating the random seed used (e.g., data/dtox/0).
Analyze results
plot_figures.R generates all figures shown in the publication, using files in data (described below):
bash
Rscript scripts/plot_figures.R
styling.R is required by this script.
P-NET
After each run, the following files are copied from the P-NET output folders:
analysis/extracted/node_importance_graph_adjusted.csv(renamed tonode_importance.csv): contains node importance scores, with the following columns:- (first, unnamed): node name
- coef: original node importance scores
- coef_graph: indegree plus outdegree of node
- coefcombined: adjusted node importance score (= coef / coefgraph if coefgraph > mean(coefgraph) + 5 sd(coef_graph) in the respective layer)
- coefcombinedzscore: scaled coef_combined
- coefcombined2: z(z(coefgraph) - z(coef))
- layer: layer of the node
_logs/p1000/pnet/onsplit_average_reg_10_tanh_large_testing/P-net_ALL_testing.csv(renamed topredictions_test.csv): predictions for the test set, with the following columns:- (first, unnamed): sample name
- pred: predicted class (unfortunately, encoded by a double 1.0 or 0.0)
- pred_scores: probability of the predicted class
- y: true class (encoded as integer 1 or 0)
_logs/p1000/pnet/onsplit_average_reg_10_tanh_large_testing/P-net_ALL_training.csv(renamed topredictions_train.csv): predictions for the training set (same columns as above)
DTox
The following files generated by DTox are required for subsequent analyses:
module_relevance.tsv: contains node importance scores, with the following columns:- (first, unnamed): compound identifier
- remaining columns: node identifiers (UniProt and Reactome IDs)
test_labels.csv: predictions for the test set, with two columns:- truth: true label (0 or 1)
- predicted: predicted label (decimal number between 0 and 1)
Appendix: How to build Docker images
P-NET
The folder docker/pnet contains everything needed for building a Docker image with P-NET installed:
Dockerfile: instructions for assembling the imageentrypoint.sh: script for running P-NET; used as entrypoint in the containerenvironment_pnet.yml: conda environment specificationpatch_seeds.diff: patch that allows to change the random seed for P-NETsetup.sh: executed during image assembly; installs P-NET (with input data) and conda
Build and deploy this image via
bash
docker build --tag ghcr.io/csbg/pnet-container:1.0.0 .
docker push ghcr.io/csbg/pnet-container:1.0.0
DTox
The folder docker/dtox contains everything needed for building a Docker image with DTox installed:
Dockerfile: instructions for assembling the imageentrypoint.sh: entrypoint in the container; activates conda environmentenvironment_dtox.yml: conda environment specificationpatch_seeds.diff: patch that allows to change the random seed for DTox and saves predicted labels for the test setrun_dtox.py: executes the DTox workflow as described in the tutorial available in the DTox GitHub repositorysetup.sh: executed during image assembly; installs DTox and conda
Build and deploy this image via
bash
docker build --tag ghcr.io/csbg/dtox-container:1.0.0 .
docker push ghcr.io/csbg/dtox-container:1.0.0
R
The folder docker/r contains the Dockerfile needed for building a Docker image with R and required packages installed.
Build and deploy this image via
bash
cp ../../renv.lock .
docker build --tag ghcr.io/csbg/r_pnet_robustness:1.0.0 .
docker push ghcr.io/csbg/r_pnet_robustness:1.0.0
Appendix: Description of files required by P-NET
genes/: only the genes present in both of the following two files will be analyzed: (a)tcga_prostate_expressed_genes_and_cancer_genes.csv(b)HUGO_genes/protein-coding_gene_with_coordinate_minimal.txt(TSV, no column names; meaning of columns: chromosome, start, end, gene name)pathways/:pathways_short_names.xlsx: short pathway names for figure labelsReactome/ReactomePathways.gmt: genes associated with Reactome pathways; TSV, no column names, variable number of columns: (1) pathway name (2) reactome id (3) type (unused) (4ff) associated genesReactome/ReactomePathways.txt: TSV mapping Reactome ids to names; loaded by P-NET but apparently not used (?)Reactome/ReactomePathwaysRelation.txt: TSV specifying the Reactome pathway hierarchy as edge list; columns indicate parent and child; only human pathways are used (i.e., the child id has to start with "HSA")
prostate/processed/P1000_final_analysis_set_cross_important_only.csv: mutation data; first column contains sample name, remaining columns represent genes, cells indicate number of mutations; data is preprocessed to a binary matrix, indicating presence/absence of at least one mutation (i.e., 1 if original >= 1)P1000_data_CNA_paper.csv: CNV data; first column (unnamed) contains sample name, remaining columns represent genes, cells indicate copy number status; data is preprocessed to two binary matrices: one indicates presence of copy number amplification (1 if original > 1.5), the other indicates presence of CN deletion (1 if original < -1.5)response_paper.csv: input labels, two columns: (1) id – sample name (2) response – sample label (1 = metastatic tumor)splits/: splits of input data; all files have three columns: (1) [unnamed] – running number starting at zero (2) id – sample name (3) response – sample label (column is NOT used by P-NET!)test_set.csv: samples in the test settraining_set_0.csv: samples in the training setvalidation_set.csv: samples in the validation set
Owner
- Name: Computational Systems Biology Group
- Login: csbg
- Kind: organization
- Location: Salzburg, AT
- Website: www.plus.ac.at/fortelny
- Repositories: 1
- Profile: https://github.com/csbg
The Fortelny Lab at the University of Salzburg
Citation (CITATION.cff)
cff-version: 1.2.0
message: If you use this software, please cite both the article from preferred-citation and the software itself.
title: Code for the publication "Reliable interpretability of biology-inspired deep neural networks"
authors:
- family-names: Esser-Skala
given-names: Wolfgang
orcid: https://orcid.org/0000-0002-7350-4045
- family-names: Fortelny
given-names: Nikolaus
orcid: https://orcid.org/0000-0003-4025-9968
version: 1.2.0
type: software
license: MIT
doi: 10.5281/zenodo.8386694
repository-code: https://github.com/csbg/pnet_robustness
preferred-citation:
type: article
authors:
- family-names: Esser-Skala
given-names: Wolfgang
affiliation: Department of Biosciences and Medical Biology, University of Salzburg, Salzburg, Austria
orcid: https://orcid.org/0000-0002-7350-4045
- family-names: Fortelny
given-names: Nikolaus
affiliation: Department of Biosciences and Medical Biology, University of Salzburg, Salzburg, Austria
orcid: https://orcid.org/0000-0003-4025-9968
email: nikolaus.fortelny@plus.ac.at
title: Reliable interpretability of biology-inspired deep neural networks
journal: npj Systems Biology and Applications
volume: 9
pages: 50
year: 2023
doi: 10.1038/s41540-023-00310-8
license: CC-BY-4.0
GitHub Events
Total
- Watch event: 1
- Fork event: 1
Last Year
- Watch event: 1
- Fork event: 1
Dependencies
- continuumio/miniconda3 23.3.1-0 build
- continuumio/miniconda3 23.3.1-0 build
- rocker/r-ver 4.3.1 build