https://github.com/bhklab/predictio_nextflow
Science Score: 46.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: pubmed.ncbi, ncbi.nlm.nih.gov -
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: bhklab
- Language: Nextflow
- Default Branch: main
- Size: 11.5 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
PredictioR Nextflow Pipeline
Overview
The PredictioR Nextflow pipeline is designed to analyze immunotherapy responses and identify biomarkers across various cancers. It utilizes Nextflow for workflow management and Docker for reproducibility, focusing on handling SummarizedExperiment objects for in-depth biomarker analysis.
The main.nf script integrates three key analysis steps:
1. Gene Level Analysis
2. Signature Level Analysis
3. Meta Analysis
Software Requirements and Installation Instructions
Nextflow
- Version: 24.04.2
- Installation and Resources:
- Setup Instructions: For detailed installation steps, please refer to the Nextflow Setup Guide.
- Documentation and Resources:
- General Documentation
- Nextflow Training
Docker
- Purpose: Ensures computational reproducibility by containerizing the environment.
- Installation Guide: Install Docker
- PredictioR Docker Image:
bash docker pull bhklab/nextflow-env- Docker Hub: bhklab/nextflow-env
Reference Resources
- GitHub Repository: PredictioR GitHub Repository
- Key Publication: PubMed Reference, PMID: 36055464
Data Directory Configuration
Gene Level Analysis
- Input Data Directory:
bash params.gene_data_dir = './ICB_data' - Example Data Files: Includes files such as
ICB_small_Hugo.rda,ICB_small_Mariathasan.rda, which are SummarizedExperiment objects. These files are located within theICB_datadirectory at the bhklab PredictioR data repository.- Output Data Directory:
bash params.out_dir = './output/main_output'
- Output Data Directory:
- Output Details: The results of the Gene Level Analysis are stored in the
main_outputdirectory, stratified by their study ID for clarity and ease of reference.
Signature Level Analysis
- Input Data Directory:
bash params.signature_data_dir = './SIG_data' - Example Data Files: Files like
CYT_Rooney.rda,EMT_Thompson.rda,PredictIO_Bareche.rdaare data frames with columns like:signature_name: Name of the signaturegene_name: Name of the geneweight: Weight assigned to each gene within the signature
To see other columns, these files are also sourced from the bhklab SignatureSets GitHub repository.
The .rda files are stored in the object sig as data frames. Please follow the same format for consistency.
- Output Data Directory:
bash params.out_dir = './output/main_output' - Output Details: The results of the Signature Level Analysis are stored in the
main_outputdirectory, stratified by their study ID for clarity and ease of reference.
Meta Analysis
- Input Data Directory:
- The meta-analysis step uses the results from both gene-level and signature-level analyses.
- Input Directories:
- Gene level:
./output/main_output - Signature level:
./output/main_output
- Output Data Directory:
bash params.out_dir = './output/main_output'
Input Data Specifications
- #### ICB Data Information
This table summarizes each dataset by study and treatment type, along with cancer types, clinical and molecular data availability, and relevant PMID references. Required columns include 'treatment' and 'cancer type'.
| Dataset | Patients [#] | Cancer type | Treatment | Clinical endpoints | Molecular data | PMID | |------------------------|--------------|-------------|-----------------------------|--------------------|----------------|-----------| | ICBsmallHugo | 27 | Melanoma | PD-1/PD-L1 | OS | RNA | 26997480 | | ICBsmallLiu | 121 | Melanoma | PD-1/PD-L1 | PFS/OS | RNA/DNA | 31792460 | | ICBsmallMiao | 33 | Kidney | PD-1/PD-L1 | PFS/OS | RNA/DNA | 29301960 | | ICBsmallNathanson | 24 | Melanoma | CTLA4 | OS | RNA/DNA | 27956380 | | ICBsmallPadron | 45 | Pancreas | PD-1/PD-L1 | PFS/OS | RNA | 35662283 | | ICBsmallRiaz | 46 | Melanoma | PD-1/PD-L1 | OS | RNA/DNA | 29033130 | | ICBsmallVanAllen | 42 | Melanoma | CTLA4 | PFS/OS | RNA/DNA | 26359337 | | ICBsmall_Mariathasan | 195 | Bladder | PD-1/PD-L1 | OS | RNA/DNA | 29443960 |
Ensure that clinical data is properly organized with all required and additional fields to ensure the integrity of the analysis.
Required Columns:
patientid: Unique identifier for patientstreatmentid: Details of the treatment regimenresponse: Patient response to treatment (Responder 'R', Non-responder 'NR')tissueid: Standardized cancer typesurvival_time_pfs: Time to progression-free survival, Example: 2.6 monthssurvival_time_os: Time to overall survivalsurvival_unit: Measurement units for survival times, typically monthsevent_occurred_pfs: Binary indicator of event occurrence during PFS (1,0)event_occurred_os: Binary indicator of event occurrence during OS (1,0)
Additional Recommended Fields: Include sex, age, histo (histological type), stage of cancer, dna, and rna details among others as necessary.
- #### Signature Information
This table summarizes each signature name by study and PMID references, the method for computing the signature score, and the corresponding score function.
| Signature | DNA/RNA | RNA Type | Method | Cancer Type | Score Function | PMID | |----------------------|---------|--------------------|--------|---------------------|----------------|-----------| | ADOSidders | RNA | Count RNA-seq/TPM | GSVA | Multiple | geneSigGSVA | 31953314 | | APMThompson | RNA | log CPM | GSVA | Lung, melanoma | geneSigGSVA | 33028693 | | APMWang | RNA | Microarray | GSVA | Multiple | geneSigGSVA | 31767055 | | BcellBudczies | RNA | Microarray | GSVA | Lung | geneSigGSVA | 33520406 | | BcellHelmink | RNA | log FPKM | GSVA | Melanoma, kidney | geneSigGSVA | 31942075 | | BloodFriedlander | RNA | Microarray | GSVA | Melanoma | geneSigGSVA | 28807052 | | C-ECMChakravarthy | RNA | Normalized counts | ssGSEA | Multiple | geneSigssGSEA | 30410077 | | CCL5-CXCL9Dangaj | RNA | | GSVA | Multiple | geneSigGSVA | 31185212 | | CD39-CD8Tcell_Chow | RNA | RNA-seq count | GSVA | Lung | geneSigGSVA | 36574773 |
Required Columns:
- signature: Name of the signature, same names located in './SIG_data'
- method: Used for signature score calculation
- score function: Specifying the function that should be used in the R script
For detailed information on the signatures used in the pipeline, refer to the signature(there are more than 50) information CSV available at: Signature Information CSV.
Running the Pipeline
Run the pipeline with the configured parameters using Nextflow:
bash
nextflow run main.nf
Additional Notes
- Necessary R packages and dependencies are installed as specified in
load_libraries.Rand included in the BHK Docker. - Customize the nextflow.config file to specify any additional parameters or configurations required for your specific analysis needs
Owner
- Name: BHKLAB
- Login: bhklab
- Kind: organization
- Location: Toronto, Ontario, Canada
- Website: http://www.pmgenomics.ca/bhklab/
- Repositories: 168
- Profile: https://github.com/bhklab
The Haibe-Kains Laboratory @ Princess Margaret Cancer Centre
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Nasim Bondar Sahebi | 1****i | 13 |
| Nasim Sahebi | s****i@m****a | 9 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 2 days
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 2 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- CRMacPherson (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- rocker/rstudio 4.3.2 build