covpipe2
SARS-CoV-2 genome reconstruction for Illumina data in Nextflow
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Repository
SARS-CoV-2 genome reconstruction for Illumina data in Nextflow
Basic Info
Statistics
- Stars: 4
- Watchers: 3
- Forks: 1
- Open Issues: 12
- Releases: 26
Metadata Files
README.md
CoVpipe2
CoVpipe2 is a Nextflow pipeline for reference-based genome reconstruction of SARS-CoV-2 from NGS data. In principle it can be used also for other viruses.
Table of contents
- [CoVpipe2](#covpipe2) - [Quick installation](#quick-installation) - [Call help](#call-help) - [Test run](#test-run) - [Update the pipeline](#update-the-pipeline) - [Use a certain release](#use-a-certain-release) - [Quick run examples](#quick-run-examples) - [Example 1:](#example-1) - [Example 2:](#example-2) - [Example sample sheet](#example-sample-sheet) - [Manual](#manual) - [Changes to CoVpipe](#changes-to-covpipe) - [Workflow](#workflow) - [Citations](#citations) - [Acknowledgement, props and inspiration](#acknowledgement-props-and-inspiration)Quick installation
The pipeline is written in Nextflow, which can be used on any POSIX compatible system (Linux, OS X, etc). Windows system is supported through WSL. You need Nextflow installed and either conda, or Docker, or Singularity to run the steps of the pipeline:
Install
Nextflowvia self-installing packageclick here for a bash one-liner
```bash wget -qO- https://get.nextflow.io | bash
In the case you don’t have wget
curl -s https://get.nextflow.io | bash
```
OR
Install
Nextflowviacondaclick here for a bash two-liner for Miniconda3 Linux 64-bit
bash wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh conda create -n nextflow -c bioconda nextflow conda active nextflow
All other dependencies and tools will be installed within the pipeline via conda, Docker or Singularity.
:warning: Important for conda/mamba users: Make sure that your conda channels are configured according to the bioconda usage:
Check your current channel list:
bash
conda config --show channels
Change you channel list:
bash
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
Please, check bioconda usage for the latest configuration!
Call help
bash
nextflow run rki-mf1/CoVpipe2 -r <version> --help
Test run
Validate your installation with a test run:
```bash
for a Conda installation
the Conda channel configuration needs to be bioconda conform
nextflow run rki-mf1/CoVpipe2 -r
for a Singularity installation
nextflow run rki-mf1/CoVpipe2 -r
for a Docker installation
nextflow run rki-mf1/CoVpipe2 -r
For more configuration options, see here.
Update the pipeline
bash
nextflow pull rki-mf1/CoVpipe2
Use a certain release
We recommend to use a stable release of the pipeline:
bash
nextflow pull rki-mf1/CoVpipe2 -r <RELEASE>
Quick run examples
Example 1:
bash
nextflow run rki-mf1/CoVpipe2 -r <version> \
--reference 'sars-cov-2' \
--fastq my_samples.csv --list \
--kraken \
--cores 4 --max_cores 8
- Read input from sample sheet
- Perform taxonomic classification to remove not SARS-CoV-2 reads
- Local execution with maximal 8 cores in total and conda
Example 2:
bash
nextflow run rki-mf1/CoVpipe2 -r <version> \
--reference 'sars-cov-2' \
--fastq '*R{1,2}.fastq.gz' \
--adapter /path/to/repo/data/adapters/NexteraTransposase.fasta \
--primer_version V4.1 \
-profile slurm,singularity
- Remove adapters
- Clip primer (ARTIC version V4.1)
- Execution on a SLURM system with Singularity
Example sample sheet
CoVpipe2 accepts a sample sheet in CSV format as input and should look like this:
sample,fastq_1,fastq_2
sample1,/path/to/reads/id1_1.fastq.gz,/path/to/reads/id1_2.fastq.gz
sample2,/path/to/reads/id2_1.fastq.gz,/path/to/reads/id2_2.fastq.gz
sample3,/path/to/reads/id3_1.fastq.gz,/path/to/reads/id3_2.fastq.gz
sample4,/path/to/reads/id4_1.fastq.gz,/path/to/reads/id4_2.fastq.gz
The header is required. Pay attention the set unique sample names!
Manual
click here to see the complete help message
``` Robert Koch Institute, MF1 Bioinformatics Workflow: CoVpipe2 Usage examples: nextflow run CoVpipe2.nf --fastq '*R{1,2}.fastq.gz' --cores 4 --max_cores 8 or nextflow run rki-mf1/CoVpipe2 -rChanges to CoVpipe
- Workflow management framework:
snakemake->Nextflow - Docker/Singularity and conda support for each step
- Container/conda updated feature for
pangolinandnextclade - HPC/slurm profile provided
- Fixes:
- Subtract only deletions from low coverage mask for consensus generation
- New features:
nexclade(mutation calling, clade assignment)LCS(linage decomposition)- Restructured report
kronaplots (visualization ofKraken2output)president(genome quality control)
- Version update (status CoVpipe2 v0.2.1):
bcftools: 1.11 -> 1.14- Note: https://github.com/samtools/bcftools/issues/1708
liftoff: 1.5.2 -> 1.6.2kraken2: 2.1.0 -> 2.1.2freebayes: 1.3.2 -> 1.3.6fastp: 0.20.1 -> 0.23.2bedtools: 2.29.2 -> 2.30.0
Workflow
Workflow overview:
Components originally designed by James A. Fellows Yates & nf-core under a CC0 license (public domain)
More detailed overview with process names:
 Components originally designed by James A. Fellows Yates & nf-core under a CC0 license (public domain)Even more detailed overview with process names and parameters:
 Components originally designed by James A. Fellows Yates & nf-core under a CC0 license (public domain)Citations
If you use CoVpipe2 in your work, please consider citing our publication:
Lataretu, M., Drechsel, O., Kmiecinski, R., Trappe, K., Hölzer, M., & Fuchs, S
Lessons learned: overcoming common challenges in reconstructing the SARS-CoV-2 genome from short-read sequencing data via CoVpipe2 [version 1; peer review: awaiting peer review].
F1000Research 2023, 12:1091 (https://doi.org/10.12688/f1000research.136683.1)
Additionally, an extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
Acknowledgement, props and inspiration
Owner
- Name: RKI MF1 Bioinformatics
- Login: rki-mf1
- Kind: organization
- Location: Germany
- Repositories: 9
- Profile: https://github.com/rki-mf1
Bioinformatics code of MF1
Citation (CITATIONS.md)
# CoVpipe2: Citations ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [ARTIC network](https://github.com/artic-network) - [BAMClipper](https://pubmed.ncbi.nlm.nih.gov/28484262/) > Au CH, Ho DN, Kwong A, Chan TL, Ma ESK. BAMClipper: removing primers from alignments to minimize false-negative mutations in amplicon next-generation sequencing. Sci Rep. 2017 May 8;7(1):1567. doi: 10.1038/s41598-017-01703-6. PMID: 28484262; PMCID: PMC5431517. - [BCFtools](https://www.ncbi.nlm.nih.gov/pubmed/21903627/) > Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8. PubMed PMID: 21903627; PubMed Central PMCID: PMC3198575. - [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/) > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824. - [BWA](https://arxiv.org/abs/1303.3997) > Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN] - [fastp](https://www.ncbi.nlm.nih.gov/pubmed/30423086/) > Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560. PubMed PMID: 30423086; PubMed Central PMCID: PMC6129281. - [freebayes](https://arxiv.org/abs/1207.3907) > Garrison E, Marth G. (2012) Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN] - [Kraken 2](https://www.ncbi.nlm.nih.gov/pubmed/31779668/) > Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0. PubMed PMID: 31779668; PubMed Central PMCID: PMC6883579. - [Krona](https://pubmed.ncbi.nlm.nih.gov/21961884/) > Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. doi: 10.1186/1471-2105-12-385. PMID: 21961884; PMCID: PMC3190407. - [LSC](https://pubmed.ncbi.nlm.nih.gov/35104309/) > Valieris R, Drummond RD, Defelicibus A, Dias-Neto E, Rosales RA, Tojal da Silva I. A mixture model for determining SARS-Cov-2 variant composition in pooled samples. Bioinformatics. 2022 Mar 28;38(7):1809-1815. doi: 10.1093/bioinformatics/btac047. PMID: 35104309. - [ncov-recombinant](https://github.com/ktmeaton/ncov-recombinant) - [Nextstrain](https://pubmed.ncbi.nlm.nih.gov/29790939/) > Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher RA. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018 Dec 1;34(23):4121-4123. doi: 10.1093/bioinformatics/bty407. PubMed PMID: 29790939; PubMed Central PMCID: PMC6247931. - [pangolin](https://github.com/cov-lineages/pangolin) > Áine O'Toole, Emily Scher, Anthony Underwood, Ben Jackson, Verity Hill, JT McCrone, Chris Ruis, Khali Abu-Dahab, Ben Taylor, Corin Yeats, Louis du Plessis, David Aanensen, Eddie Holmes, Oliver Pybus, Andrew Rambaut. pangolin: lineage assignment in an emerging pandemic as an epidemiological tool. Publication in preparation. - [PRESIDENT](https://github.com/rki-mf1/president) - [Python](https://www.python.org/) - [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/) > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. - [sc2rf](https://github.com/lenaschimmel/sc2rf) - [SnpEff](https://www.ncbi.nlm.nih.gov/pubmed/22728672/) > Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012 Apr-Jun;6(2):80-92. doi: 10.4161/fly.19695. PubMed PMID: 22728672; PubMed Central PMCID: PMC3679285. ## R packages - [R](https://www.R-project.org/) > R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. - [ggplot2](https://cran.r-project.org/web/packages/ggplot2/index.html) > H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Watch event: 1
- Delete event: 1
- Push event: 6
- Create event: 3
Last Year
- Watch event: 1
- Delete event: 1
- Push event: 6
- Create event: 3
Dependencies
- actions/checkout v2 composite
- actions/cache v2 composite
- actions/checkout v3 composite
- actions/setup-python v2 composite
- actions/upload-artifact v2 composite
- actions/checkout v4 composite
- conda-incubator/setup-miniconda v3 composite