https://github.com/bhklab/predictio_nextflow

https://github.com/bhklab/predictio_nextflow

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: bhklab
  • Language: Nextflow
  • Default Branch: main
  • Size: 11.5 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 12 months ago
Metadata Files
Readme

README.md

PredictioR Nextflow Pipeline

Overview

The PredictioR Nextflow pipeline is designed to analyze immunotherapy responses and identify biomarkers across various cancers. It utilizes Nextflow for workflow management and Docker for reproducibility, focusing on handling SummarizedExperiment objects for in-depth biomarker analysis.

The main.nf script integrates three key analysis steps: 1. Gene Level Analysis 2. Signature Level Analysis 3. Meta Analysis

Software Requirements and Installation Instructions

Nextflow

Docker

  • Purpose: Ensures computational reproducibility by containerizing the environment.
  • Installation Guide: Install Docker
  • PredictioR Docker Image: bash docker pull bhklab/nextflow-env

Reference Resources

Data Directory Configuration

Gene Level Analysis

  • Input Data Directory: bash params.gene_data_dir = './ICB_data'
  • Example Data Files: Includes files such as ICB_small_Hugo.rda, ICB_small_Mariathasan.rda, which are SummarizedExperiment objects. These files are located within the ICB_data directory at the bhklab PredictioR data repository.
    • Output Data Directory: bash params.out_dir = './output/main_output'
  • Output Details: The results of the Gene Level Analysis are stored in the main_output directory, stratified by their study ID for clarity and ease of reference.

Signature Level Analysis

  • Input Data Directory: bash params.signature_data_dir = './SIG_data'
  • Example Data Files: Files like CYT_Rooney.rda, EMT_Thompson.rda, PredictIO_Bareche.rda are data frames with columns like:
    • signature_name: Name of the signature
    • gene_name: Name of the gene
    • weight: Weight assigned to each gene within the signature

To see other columns, these files are also sourced from the bhklab SignatureSets GitHub repository. The .rda files are stored in the object sig as data frames. Please follow the same format for consistency.

  • Output Data Directory: bash params.out_dir = './output/main_output'
  • Output Details: The results of the Signature Level Analysis are stored in the main_output directory, stratified by their study ID for clarity and ease of reference.

Meta Analysis

  • Input Data Directory:
    • The meta-analysis step uses the results from both gene-level and signature-level analyses.
    • Input Directories:
    • Gene level: ./output/main_output
    • Signature level: ./output/main_output
  • Output Data Directory: bash params.out_dir = './output/main_output'

Input Data Specifications

  • #### ICB Data Information

This table summarizes each dataset by study and treatment type, along with cancer types, clinical and molecular data availability, and relevant PMID references. Required columns include 'treatment' and 'cancer type'.

| Dataset | Patients [#] | Cancer type | Treatment | Clinical endpoints | Molecular data | PMID | |------------------------|--------------|-------------|-----------------------------|--------------------|----------------|-----------| | ICBsmallHugo | 27 | Melanoma | PD-1/PD-L1 | OS | RNA | 26997480 | | ICBsmallLiu | 121 | Melanoma | PD-1/PD-L1 | PFS/OS | RNA/DNA | 31792460 | | ICBsmallMiao | 33 | Kidney | PD-1/PD-L1 | PFS/OS | RNA/DNA | 29301960 | | ICBsmallNathanson | 24 | Melanoma | CTLA4 | OS | RNA/DNA | 27956380 | | ICBsmallPadron | 45 | Pancreas | PD-1/PD-L1 | PFS/OS | RNA | 35662283 | | ICBsmallRiaz | 46 | Melanoma | PD-1/PD-L1 | OS | RNA/DNA | 29033130 | | ICBsmallVanAllen | 42 | Melanoma | CTLA4 | PFS/OS | RNA/DNA | 26359337 | | ICBsmall_Mariathasan | 195 | Bladder | PD-1/PD-L1 | OS | RNA/DNA | 29443960 |

Ensure that clinical data is properly organized with all required and additional fields to ensure the integrity of the analysis.

  • Required Columns:

    • patientid: Unique identifier for patients
    • treatmentid: Details of the treatment regimen
    • response: Patient response to treatment (Responder 'R', Non-responder 'NR')
    • tissueid: Standardized cancer type
    • survival_time_pfs: Time to progression-free survival, Example: 2.6 months
    • survival_time_os: Time to overall survival
    • survival_unit: Measurement units for survival times, typically months
    • event_occurred_pfs: Binary indicator of event occurrence during PFS (1,0)
    • event_occurred_os: Binary indicator of event occurrence during OS (1,0)
  • Additional Recommended Fields: Include sex, age, histo (histological type), stage of cancer, dna, and rna details among others as necessary.

    • #### Signature Information

This table summarizes each signature name by study and PMID references, the method for computing the signature score, and the corresponding score function.

| Signature | DNA/RNA | RNA Type | Method | Cancer Type | Score Function | PMID | |----------------------|---------|--------------------|--------|---------------------|----------------|-----------| | ADOSidders | RNA | Count RNA-seq/TPM | GSVA | Multiple | geneSigGSVA | 31953314 | | APMThompson | RNA | log CPM | GSVA | Lung, melanoma | geneSigGSVA | 33028693 | | APMWang | RNA | Microarray | GSVA | Multiple | geneSigGSVA | 31767055 | | BcellBudczies | RNA | Microarray | GSVA | Lung | geneSigGSVA | 33520406 | | BcellHelmink | RNA | log FPKM | GSVA | Melanoma, kidney | geneSigGSVA | 31942075 | | BloodFriedlander | RNA | Microarray | GSVA | Melanoma | geneSigGSVA | 28807052 | | C-ECMChakravarthy | RNA | Normalized counts | ssGSEA | Multiple | geneSigssGSEA | 30410077 | | CCL5-CXCL9Dangaj | RNA | | GSVA | Multiple | geneSigGSVA | 31185212 | | CD39-CD8Tcell_Chow | RNA | RNA-seq count | GSVA | Lung | geneSigGSVA | 36574773 |

Required Columns: - signature: Name of the signature, same names located in './SIG_data' - method: Used for signature score calculation - score function: Specifying the function that should be used in the R script

For detailed information on the signatures used in the pipeline, refer to the signature(there are more than 50) information CSV available at: Signature Information CSV.

Running the Pipeline

Run the pipeline with the configured parameters using Nextflow:

bash nextflow run main.nf

Additional Notes

  • Necessary R packages and dependencies are installed as specified in load_libraries.R and included in the BHK Docker.
  • Customize the nextflow.config file to specify any additional parameters or configurations required for your specific analysis needs

Owner

  • Name: BHKLAB
  • Login: bhklab
  • Kind: organization
  • Location: Toronto, Ontario, Canada

The Haibe-Kains Laboratory @ Princess Margaret Cancer Centre

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 22
  • Total Committers: 2
  • Avg Commits per committer: 11.0
  • Development Distribution Score (DDS): 0.409
Past Year
  • Commits: 22
  • Committers: 2
  • Avg Commits per committer: 11.0
  • Development Distribution Score (DDS): 0.409
Top Committers
Name Email Commits
Nasim Bondar Sahebi 1****i 13
Nasim Sahebi s****i@m****a 9
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 2 days
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 2 days
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • CRMacPherson (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

Dockerfile docker
  • rocker/rstudio 4.3.2 build