covidww
A Nextflow pipeline to quantify SARS-CoV-2 lineages from wastewater samples
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary
Repository
A Nextflow pipeline to quantify SARS-CoV-2 lineages from wastewater samples
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md

Introduction
covidww is a bioinformatics pipeline, built with nextflow following the nf-core template, that is designed to determine the relative abundance of the SARS-CoV-2 lineages within wastewater samples. It takes FASTQ files, primers in a BED file, a reference sequence (NC_045512.2 by default), and optional metadata as input to perform quality control (QC), trimming, alignment, deconvolution, and produces demixing reports and visualizations as well as a detailed QC report.

- Read QC (
FastQC) - SARS-CoV-2 Genome Indexing (
BWA-mem2) - Quality and adapter trimming (
Fastp) - Sequence alignment (
BWA-mem2) - Alignment indexing (
SAMtools) - Alignment QC (
SAMtools) - Primer trimming (
iVar) - Sorting (
SAMtools) - Variant Calling (
Freyja) - Demixing (
Freyja) - Demix cleaning
- Summary plotting
- Map plotting
- Present QC for raw reads (
MultiQC)
Usage
[NOTE] Nextflow and Anaconda are required to be able to run this pipeline. Most processes can be run through containers and the software specified to do so must be installed before running.
[NOTE] If you are new to Nextflow, please refer to this page on how to set-up Nextflow
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
csv
sample,fastq_1,fastq_2
Sample1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
Where each row represents a pair of fastq files.
Optionally, you can prepare a metadata file that looks as follows:
metadata.csv:
csv
Sample,City,State
Sample1,Indianapolis,Indiana
To be used to plot the data on a map.
After cloning this repository, you can run the pipeline using:
bash
nextflow run covidww \
-profile <docker/conda/singularity> \
--input samplesheet.csv \
--primers [primers.bed] \
--outdir [OUTDIR]
And can optionally include the metadata by adding the metadata parameter:
bash
nextflow run covidww \
-profile <docker/conda/singularity> \
--input samplesheet.csv \
--primers [primers.bed] \
--metadata metadata.csv \
--outdir [OUTDIR]
Using the profile test or test_full will start a small run to ensure everything is working properly. It uses input from
example and the output of those tests are also available.
bash
nextflow run covidww \
- profile <test/test_full>,<singularity/docker/conda>
--outdir [OUTDIR]
Additional covidww parameters
--intermediate True will save all the intermediate files
--adapter_fasta [fasta file] tells Fastp to look for these additional adapters to trim
--save_trim_fail True tells Fastp to save the failed trim reads
--save_merge True tells Fastp to save merged reads
--radius [float] sets the radius of pie charts for map plotting
--reference_genome [fasta file] changes the reference genome to the fasta file
[WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
Pipeline output
Upon completion, the following output files will be saved in the OUTDIR, and optionally the intermediate files
generated by the pipeline. When using intermediate True the OUTDIR will be populated in the structure
outlined in output.md. Full descriptions of these files are also there.
- multiqc/multiqc_report.html
- wastewateranalysis<run date>.csv
- demixsummary<run date>.pdf
- abundancemap<run date>.png (only with metadata)
- abundance_bar<run date>.png (only with metadata)
- metadatamergeddemixresult<run date>.csv (only with metadata)
Credits
covidww was originally written by David Schaeper.
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: IDOHLabs-Bioinformatics
- Login: IDOHLabs-Bioinformatics
- Kind: organization
- Repositories: 1
- Profile: https://github.com/IDOHLabs-Bioinformatics
Citation (CITATIONS.md)
# covidww: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. - [Fastp](https://github.com/OpenGene/fastp) > Shifu Chen. 2023. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2: e107. https://doi.org/10.1002/imt2.107 - [BWAmem-2](https://github.com/bwa-mem2/bwa-mem2) > Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019. 10.1109/IPDPS.2019.00041 - [SAMtools](https://www.bing.com/search?pglt=171&q=samtools&cvid=316da9e035bb45d4a772e67a0364605e&gs_lcrp=EgZjaHJvbWUqBggAEEUYOzIGCAAQRRg7MgYIARAAGEAyBggCEEUYOzIGCAMQRRg7MgYIBBAAGEAyBggFEAAYQDIGCAYQRRg8MgYIBxBFGDwyBggIEEUYPDIHCAkQRRj8VdIBCDIwOTVqMGoxqAIAsAIA&FORM=ANNAB1&PC=U531) > Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li, Twelve years of SAMtools and BCFtools, GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008 - [iVar](https://andersen-lab.github.io/ivar/html/index.html) > Grubaugh, N.D., Gangavarapu, K., Quick, J. et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol 20, 8 (2019). https://doi.org/10.1186/s13059-018-1618-7 - [Freyja](https://github.com/andersen-lab/Freyja) > Karthikeyan, S., Levy, J.I., De Hoff, P. et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature 609, 101–108 (2022). https://doi.org/10.1038/s41586-022-05049-6 ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Push event: 9
Last Year
- Push event: 9
Dependencies
- actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
- jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
- nf-core/setup-nextflow v1 composite
- actions/stale 28ca1036281a5e5922ead5184a1bbf96e5fc984e composite
- htslib 1.21.*
- samtools 1.21.*
- htslib 1.21.*
- samtools 1.21.*