Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: Phuong-Le
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 40.4 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

Nextflow <!-- run with conda --> run with docker run with singularity Launch on Seqera Platform

Introduction

nf-core/strelkasomatic is a bioinformatics pipeline that calls somatic mutations from BAM files using Illumina's Manta and Strelka2.

There are 2 steps involved

  1. Manta
    • Manta to call candidate small Indels, which can then be used to call somatic SNVs and Indels in the Strelka step (along with a "results" directory under "manta_out/${sample_id}", details in docs/output.md)
  2. Strelka
    • Strelka to call the final SNVs and Indels VCF files, output files are published in a "results" directory under "strelka_out/${sample_id}", details in docs/output.md

Dependencies

  • Nextflow >= 24.04.2 required

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.

  • python 2.6+

Installation

git clone git@github.com:Phuong-Le/StrelkaSomatic.git

Usage

[NOT PUBLIC YET] Make sure to test your setup with -profile test before running the workflow on actual data.

The input sample sheet should be either in a tab delimited format (extension must be .tsv), or comma delimited format (extension must be .csv), like samplesheet.tsv. Your input should contain the following columns (column names must be accurate but no need to be in this order, redundant columns will be ignored)

| Column | Description | | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | sample_id | sample ID, must be unique | | match_normal_id | ID for your match normal sample | | | bam | bam file for sample_id, must exist | | | bai | tabix index file for bam, must exist | | bam_match | bam file for match_normal_id, must exist | | bai_match | tabix index file for bam_match, must exist |

Please find the detailed instructions to run the pipeline, including the input parameters in docs/usage.md. You can run the pipeline using:

```bash

nextflow run /path/to/SangerSomaticMutation/main.nf \ -profile \ --input /path/to/samplesheet.csv \ --fasta /path/to/fasta/genome.fa \ --fai /path/to/fai/genome.fa.fai \ --outdir /path/to/outdir ```

If using igenomes please add the following to the nextflow run, note that igenomes do not have the tabix index file for $fasta so you will have to specify this, for example

bash --igenomes_ignore false \ --fai /path/to/fai/genome.fa.fai \ --genome your_genome_label

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Sanger users

Sanger users can run the pipeline as follows. Please refer to docs/sanger.md to ensure you have the right set up.

```bash module load cellgen/nextflow/24.10.2

outdir=/path/to/outdir workdir=/path/to/working_directory # where a work directory is created script=/path/to/StrelkaSomatic/main.nf mkdir -p $workdir input=/path/to/samplesheet.tsv fasta=/path/to/fasta/genome.fa fai=/path/to/fasta/index/genome.fa.fai

config=/path/to/StrelkaSomatic/conf/sanger_lfs.config

bsub -cwd $workdir -q long -o %J.o -e %J.e -R'span[hosts=1] select[mem>10000] rusage[mem=10000]' -M10000 -env "all" \ "nextflow run ${script} -c ${config} --input ${input} --outdir ${outdir} --fasta ${fasta} --fai ${fai} -resume" ```

or as follows

```bash module load cellgen/nextflow/24.10.2

outdir=/path/to/outdir workdir=/path/to/workingdirectory # where a work directory is created script=/path/to/StrelkaSomatic/main.nf mkdir -p $workdir input=/path/to/samplesheet.tsv customgenomebase=/lustre/scratch124/casm/team78pipelines/canpipe/live/ref/Homosapiens # please let me know if you're using a different genome so I can update the config for you usecustomgenome=true genome=GRCh38fullanalysissetplusdecoyhla

config=/path/to/StrelkaSomatic/conf/sanger_lfs.config

bsub -cwd $workdir -q long -o %J.o -e %J.e -R'span[hosts=1] select[mem>10000] rusage[mem=10000]' -M10000 -env "all" \ "nextflow run $script -c ${config} --input ${input} --outdir ${outdir} --usecustomgenome ${usecustomgenome} --customgenomebase ${customgenomebase} --genome ${genome} -resume" ```

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/strelkasomatic was originally written by Phuong Le.

We thank the following people for their extensive assistance in the development of this pipeline:

Stephen Fitzgerald

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Owner

  • Name: Anh Phuong Le (Phuong)
  • Login: Phuong-Le
  • Kind: user
  • Location: Cambridge, UK
  • Company: Wellcome Sanger Institute

Citation (CITATIONS.md)

# nf-core/strelkasomatic: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [Manta](https://academic.oup.com/bioinformatics/article/32/8/1220/1743909?login=true)

> Chen, X. et al. (2016) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics, 32, 1220-1222.

- [Strelka2](https://www.nature.com/articles/s41592-018-0051-x)

> Kim, S., Scheffler, K. et al. (2018) Strelka2: fast and accurate calling of germline and somatic variants. Nature Methods, 15, 591-594.

<!-- ## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. -->

GitHub Events

Total
  • Push event: 2
  • Public event: 1
Last Year
  • Push event: 2
  • Public event: 1