icgc-argo-mutational-signatures

ICGC ARGO Mutational Signatures Workflow Packages

https://github.com/icgc-argo-workflows/icgc-argo-mutational-signatures

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.4%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

ICGC ARGO Mutational Signatures Workflow Packages

Basic Info
  • Host: GitHub
  • Owner: icgc-argo-workflows
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Size: 9.79 MB
Statistics
  • Stars: 2
  • Watchers: 13
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

nf-core/icgcargomutsig nf-core/icgcargomutsig

AWS CICite with Zenodo

Nextflow run with conda run with docker run with singularity Launch on Nextflow Tower

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTube

Introduction

nf-core/icgcargomutsig is a bioinformatics pipeline that can be used to convert GDC MAF files or a collection of VCF files into mutational count matrices and performs both signature assignment using SigProfiler and signature.tools.lib and calculates error statistics for the assignment performance.

workflow_diagram

  1. Generate SBS96, DBS78 and ID83 count matrices using (SigProfilerMatrixgenerator)
  2. Assessment of row orders to ensure full compatibility between the reference catalogues of each assignment tool and the input data.
  3. Assignment of SBS signatures to the COSMIC mutational signature catalogue using (SigProfilerExtractor) and (signature.tools.lib)
  4. Calculation of error thresholds using Kullback-Leibler divergence, root-square mean error, sum of absolute distances and Hellinger Distance.
  5. Generation of a (MultiQC) report containing run information and log data.

Usage

For more details and a quick start guide, please refer to the usage documentation and the parameter documentation.

Global options

  • --input (required): Absolute path to your input MAF, matrix or the folder containing the VCFs for analysis
  • --output_pattern (required): Output naming convention for the analysis
  • --outdir (required): Relative or absolute path to the desired output destination

SigProfiler tool options (SigProfiler Matrixgenerator and Assignment)

  • --filetype (required): Defines which input type is passed to the SigProfiler tools, currently supported options are 'MAF', 'Matrix' or 'VCF'
  • --ref (required): Defines the reference genome from which the data was generated, currently supported options include 'GRCh37' and 'GRCh38'
  • --exome: This flag defines if the SigProfiler tools should run against the COSMIC exome/panel reference instead of the WGS reference, activate with --exome true. [default: false]
  • --context: Defines which sequence context types should be assigned to the respective COSMIC catalogues for the SigProfiler Assignment module. Valid options include "96", "288", "1536", "DINUC", and "ID". Running the pipeline with default parameters will perform only SBS96 signature assignment. [default: '96']

signature.tools.lib options

  • --n_boots: Defines how many NMF iterations should be performed by signature.tools.lib Fit before the model converges. [default: 100]

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Frequently Asked Questions and known "bugs":

  • The Error Thresholding Module breaks die to the error Error: dims [product 31] do not match the length of object [96]: A border case for the mutational signature assignment pipeline is providing a single sample for analysis. As the error thresholding module expects a matrix as input for calculating error statistics, a single sample would will be parsed as a vector and thus break the analysis. Please provide more than a single sample to the pipeline to circumvent this error.

  • The Error Thresholding Module breaks due to a lexical error in the read_json step: This error occurs due to a "lower bound limitation" of mutations per sample which are required for signature.tools.lib to fully assign the input activities to all reference signatures without producing ǸaN values. We haven't tested the full spectrum for identification of the lower bound for our pipeline, but would recommend to only provide data with at least 50 mutations per sample.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

We thank the following people for their extensive assistance in the development of this pipeline:

  • Lancelot Seillier
  • Paula Stancl
  • Felix Beaudry
  • Sandesh Memane
  • Shawn Zamani
  • Alvin Ng
  • Linda Xiang
  • Kjong Lehmann

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: ICGC ARGO Workflows
  • Login: icgc-argo-workflows
  • Kind: organization
  • Location: Toronto, Ontario

Home of the ICGC ARGO (Accelerate Research in Genomic Oncology) Data Platform Scientific Workflows

Citation (CITATIONS.md)

# nf-core/icgcargomutsig: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [SigProfilerMatrixGenerator](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6041-2)

  > Bergstrom, E.N., Huang, M.N., Mahto, U. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 20, 685 (2019). https://doi.org/10.1186/s12864-019-6041-2

- [SigProfilerExtractor](https://www.sciencedirect.com/science/article/pii/S2666979X22001240?via%3Dihub)

  > S.M. Ashiqul Islam, Marcos Díaz-Gay, Yang Wu, Mark Barnes, Raviteja Vangara, Erik N. Bergstrom, Yudou He, Mike Vella, Jingwei Wang, Jon W. Teague, Peter Clapham, Sarah Moody, Sergey Senkin, Yun Rose Li, Laura Riva, Tongwu Zhang, Andreas J. Gruber, Christopher D. Steele, Burçak Otlu, Azhar Khandekar, Ammal Abbasi, Laura Humphreys, Natalia Syulyukina, Samuel W. Brady, Boian S. Alexandrov, Nischalan Pillay, Jinghui Zhang, David J. Adams, Iñigo Martincorena, David C. Wedge, Maria Teresa Landi, Paul Brennan, Michael R. Stratton, Steven G. Rozen, Ludmil B. Alexandrov. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genomics. Volume 2, Issue 11, 2022, 100179, ISSN 2666-979X, https://doi.org/10.1016/j.xgen.2022.100179.

- [signature.tools.lib](https://www.science.org/doi/10.1126/science.abl9283)

  > A. Degasperi et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science, doi:10.1126/science.abl9283, 2022.

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Watch event: 1
  • Create event: 2
Last Year
  • Watch event: 1
  • Create event: 2

Dependencies

.github/workflows/build-test-release.yml actions
  • ASzc/change-string-case-action v1 composite
  • actions/checkout v2 composite
  • actions/create-release v1 composite
  • actions/setup-python v2 composite
  • actions/upload-release-asset v1 composite
  • docker/build-push-action v2 composite
  • docker/login-action v1 composite
matrixgenerator/Dockerfile docker
  • python 3.9 build
signaturetoolslib/Dockerfile docker
  • ${BASE_IMAGE} centos7 build
  • bioconductor/bioconductor_docker devel build