icgc-argo-mutational-signatures
ICGC ARGO Mutational Signatures Workflow Packages
https://github.com/icgc-argo-workflows/icgc-argo-mutational-signatures
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 7 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary
Repository
ICGC ARGO Mutational Signatures Workflow Packages
Basic Info
- Host: GitHub
- Owner: icgc-argo-workflows
- License: mit
- Language: Nextflow
- Default Branch: main
- Size: 9.79 MB
Statistics
- Stars: 2
- Watchers: 13
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Introduction
nf-core/icgcargomutsig is a bioinformatics pipeline that can be used to convert GDC MAF files or a collection of VCF files into mutational count matrices and performs both signature assignment using SigProfiler and signature.tools.lib and calculates error statistics for the assignment performance.

- Generate SBS96, DBS78 and ID83 count matrices using (
SigProfilerMatrixgenerator) - Assessment of row orders to ensure full compatibility between the reference catalogues of each assignment tool and the input data.
- Assignment of SBS signatures to the COSMIC mutational signature catalogue using (
SigProfilerExtractor) and (signature.tools.lib) - Calculation of error thresholds using Kullback-Leibler divergence, root-square mean error, sum of absolute distances and Hellinger Distance.
- Generation of a (
MultiQC) report containing run information and log data.
Usage
For more details and a quick start guide, please refer to the usage documentation and the parameter documentation.
Global options
--input(required): Absolute path to your input MAF, matrix or the folder containing the VCFs for analysis--output_pattern(required): Output naming convention for the analysis--outdir(required): Relative or absolute path to the desired output destination
SigProfiler tool options (SigProfiler Matrixgenerator and Assignment)
--filetype(required): Defines which input type is passed to the SigProfiler tools, currently supported options are'MAF','Matrix'or'VCF'--ref(required): Defines the reference genome from which the data was generated, currently supported options include'GRCh37'and'GRCh38'--exome: This flag defines if the SigProfiler tools should run against the COSMIC exome/panel reference instead of the WGS reference, activate with--exome true. [default:false]--context: Defines which sequence context types should be assigned to the respective COSMIC catalogues for the SigProfiler Assignment module. Valid options include"96", "288", "1536", "DINUC", and "ID". Running the pipeline with default parameters will perform only SBS96 signature assignment. [default:'96']
signature.tools.lib options
--n_boots: Defines how many NMF iterations should be performed by signature.tools.lib Fit before the model converges. [default:100]
Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data.Warning: Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
Frequently Asked Questions and known "bugs":
The Error Thresholding Module breaks die to the error
Error: dims [product 31] do not match the length of object [96]: A border case for the mutational signature assignment pipeline is providing a single sample for analysis. As the error thresholding module expects a matrix as input for calculating error statistics, a single sample would will be parsed as a vector and thus break the analysis. Please provide more than a single sample to the pipeline to circumvent this error.The Error Thresholding Module breaks due to a
lexical errorin theread_jsonstep: This error occurs due to a "lower bound limitation" of mutations per sample which are required for signature.tools.lib to fully assign the input activities to all reference signatures without producingǸaNvalues. We haven't tested the full spectrum for identification of the lower bound for our pipeline, but would recommend to only provide data with at least 50 mutations per sample.
Pipeline output
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
Credits
We thank the following people for their extensive assistance in the development of this pipeline:
- Lancelot Seillier
- Paula Stancl
- Felix Beaudry
- Sandesh Memane
- Shawn Zamani
- Alvin Ng
- Linda Xiang
- Kjong Lehmann
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: ICGC ARGO Workflows
- Login: icgc-argo-workflows
- Kind: organization
- Location: Toronto, Ontario
- Website: https://www.icgc-argo.org
- Repositories: 26
- Profile: https://github.com/icgc-argo-workflows
Home of the ICGC ARGO (Accelerate Research in Genomic Oncology) Data Platform Scientific Workflows
Citation (CITATIONS.md)
# nf-core/icgcargomutsig: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [SigProfilerMatrixGenerator](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6041-2) > Bergstrom, E.N., Huang, M.N., Mahto, U. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 20, 685 (2019). https://doi.org/10.1186/s12864-019-6041-2 - [SigProfilerExtractor](https://www.sciencedirect.com/science/article/pii/S2666979X22001240?via%3Dihub) > S.M. Ashiqul Islam, Marcos Díaz-Gay, Yang Wu, Mark Barnes, Raviteja Vangara, Erik N. Bergstrom, Yudou He, Mike Vella, Jingwei Wang, Jon W. Teague, Peter Clapham, Sarah Moody, Sergey Senkin, Yun Rose Li, Laura Riva, Tongwu Zhang, Andreas J. Gruber, Christopher D. Steele, Burçak Otlu, Azhar Khandekar, Ammal Abbasi, Laura Humphreys, Natalia Syulyukina, Samuel W. Brady, Boian S. Alexandrov, Nischalan Pillay, Jinghui Zhang, David J. Adams, Iñigo Martincorena, David C. Wedge, Maria Teresa Landi, Paul Brennan, Michael R. Stratton, Steven G. Rozen, Ludmil B. Alexandrov. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genomics. Volume 2, Issue 11, 2022, 100179, ISSN 2666-979X, https://doi.org/10.1016/j.xgen.2022.100179. - [signature.tools.lib](https://www.science.org/doi/10.1126/science.abl9283) > A. Degasperi et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science, doi:10.1126/science.abl9283, 2022. - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Watch event: 1
- Create event: 2
Last Year
- Watch event: 1
- Create event: 2
Dependencies
- ASzc/change-string-case-action v1 composite
- actions/checkout v2 composite
- actions/create-release v1 composite
- actions/setup-python v2 composite
- actions/upload-release-asset v1 composite
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- python 3.9 build
- ${BASE_IMAGE} centos7 build
- bioconductor/bioconductor_docker devel build
