covidww

A Nextflow pipeline to quantify SARS-CoV-2 lineages from wastewater samples

https://github.com/idohlabs-bioinformatics/covidww

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

A Nextflow pipeline to quantify SARS-CoV-2 lineages from wastewater samples

Basic Info

Host: GitHub
Owner: IDOHLabs-Bioinformatics
License: mit
Language: Nextflow
Default Branch: main
Homepage:
Size: 22.9 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 1

Created over 2 years ago · Last pushed 10 months ago

Metadata Files

Readme Changelog License Citation

Introduction

covidww is a bioinformatics pipeline, built with nextflow following the nf-core template, that is designed to determine the relative abundance of the SARS-CoV-2 lineages within wastewater samples. It takes FASTQ files, primers in a BED file, a reference sequence (NC_045512.2 by default), and optional metadata as input to perform quality control (QC), trimming, alignment, deconvolution, and produces demixing reports and visualizations as well as a detailed QC report.

covidww tube map

Read QC (FastQC)
SARS-CoV-2 Genome Indexing (BWA-mem2)
Quality and adapter trimming (Fastp)
Sequence alignment (BWA-mem2)
Alignment indexing (SAMtools)
Alignment QC (SAMtools)
Primer trimming (iVar)
Sorting (SAMtools)
Variant Calling (Freyja)
Demixing (Freyja)
Demix cleaning
Summary plotting
Map plotting
Present QC for raw reads (MultiQC)

Usage

[NOTE] Nextflow and Anaconda are required to be able to run this pipeline. Most processes can be run through containers and the software specified to do so must be installed before running.

[NOTE] If you are new to Nextflow, please refer to this page on how to set-up Nextflow

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

csv sample,fastq_1,fastq_2 Sample1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz

Where each row represents a pair of fastq files.

Optionally, you can prepare a metadata file that looks as follows:

metadata.csv:

csv Sample,City,State Sample1,Indianapolis,Indiana

To be used to plot the data on a map.

After cloning this repository, you can run the pipeline using:

bash nextflow run covidww \ -profile <docker/conda/singularity> \ --input samplesheet.csv \ --primers [primers.bed] \ --outdir [OUTDIR]

And can optionally include the metadata by adding the metadata parameter:

bash nextflow run covidww \ -profile <docker/conda/singularity> \ --input samplesheet.csv \ --primers [primers.bed] \ --metadata metadata.csv \ --outdir [OUTDIR]

Using the profile test or test_full will start a small run to ensure everything is working properly. It uses input from example and the output of those tests are also available.

bash nextflow run covidww \ - profile <test/test_full>,<singularity/docker/conda> --outdir [OUTDIR]

Additional covidww parameters

--intermediate True will save all the intermediate files

--adapter_fasta [fasta file] tells Fastp to look for these additional adapters to trim

--save_trim_fail True tells Fastp to save the failed trim reads

--save_merge True tells Fastp to save merged reads

--radius [float] sets the radius of pie charts for map plotting

--reference_genome [fasta file] changes the reference genome to the fasta file

[WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Pipeline output

Upon completion, the following output files will be saved in the OUTDIR, and optionally the intermediate files generated by the pipeline. When using intermediate True the OUTDIR will be populated in the structure outlined in output.md. Full descriptions of these files are also there.

multiqc/multiqc_report.html
wastewateranalysis<run date>.csv
demixsummary<run date>.pdf
abundancemap<run date>.png (only with metadata)
abundance_bar<run date>.png (only with metadata)
metadatamergeddemixresult<run date>.csv (only with metadata)

Credits

covidww was originally written by David Schaeper.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

Name: IDOHLabs-Bioinformatics
Login: IDOHLabs-Bioinformatics
Kind: organization

Repositories: 1
Profile: https://github.com/IDOHLabs-Bioinformatics

Citation (CITATIONS.md)

# covidww: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [Fastp](https://github.com/OpenGene/fastp)
  > Shifu Chen. 2023. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2: e107. https://doi.org/10.1002/imt2.107

- [BWAmem-2](https://github.com/bwa-mem2/bwa-mem2)
  > Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019. 10.1109/IPDPS.2019.00041

- [SAMtools](https://www.bing.com/search?pglt=171&q=samtools&cvid=316da9e035bb45d4a772e67a0364605e&gs_lcrp=EgZjaHJvbWUqBggAEEUYOzIGCAAQRRg7MgYIARAAGEAyBggCEEUYOzIGCAMQRRg7MgYIBBAAGEAyBggFEAAYQDIGCAYQRRg8MgYIBxBFGDwyBggIEEUYPDIHCAkQRRj8VdIBCDIwOTVqMGoxqAIAsAIA&FORM=ANNAB1&PC=U531)
  > Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li, Twelve years of SAMtools and BCFtools, GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008

- [iVar](https://andersen-lab.github.io/ivar/html/index.html)
  > Grubaugh, N.D., Gangavarapu, K., Quick, J. et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol 20, 8 (2019). https://doi.org/10.1186/s13059-018-1618-7

- [Freyja](https://github.com/andersen-lab/Freyja)
  > Karthikeyan, S., Levy, J.I., De Hoff, P. et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature 609, 101–108 (2022). https://doi.org/10.1038/s41586-022-05049-6


## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total

Push event: 9

Last Year

Push event: 9

Dependencies

.github/workflows/ci.yml actions

actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
jlumbroso/free-disk-space 54081f138730dfa15788a46383842cd2f914a1be composite
nf-core/setup-nextflow v1 composite

.github/workflows/clean-up.yml actions

actions/stale 28ca1036281a5e5922ead5184a1bbf96e5fc984e composite

modules/nf-core/bwamem2/index/meta.yml cpan

modules/nf-core/bwamem2/mem/meta.yml cpan

modules/nf-core/fastp/meta.yml cpan

modules/nf-core/freyja/variants/meta.yml cpan

modules/nf-core/multiqc/meta.yml cpan

modules/nf-core/samtools/index/meta.yml cpan

modules/nf-core/samtools/sort/meta.yml cpan

modules/nf-core/samtools/stats/meta.yml cpan

subworkflows/nf-core/utils_nextflow_pipeline/meta.yml cpan

subworkflows/nf-core/utils_nfcore_pipeline/meta.yml cpan

subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml cpan

pyproject.toml pypi

modules/nf-core/samtools/fixmate/meta.yml cpan

modules/nf-core/samtools/markdup/meta.yml cpan

modules/nf-core/samtools/fixmate/environment.yml conda

htslib 1.21.*
samtools 1.21.*

modules/nf-core/samtools/markdup/environment.yml conda

htslib 1.21.*
samtools 1.21.*

modules/local/clean/environment.yml pypi

modules/local/freyja/demix/environment.yml pypi

modules/local/ivar/environment.yml pypi

modules/local/map_plot/environment.yml pypi

modules/local/seqtk/environment.yml pypi

modules/local/summary/environment.yml pypi

modules/nf-core/bwamem2/index/environment.yml pypi

modules/nf-core/bwamem2/mem/environment.yml pypi

modules/nf-core/fastp/environment.yml pypi

modules/nf-core/freyja/variants/environment.yml pypi

modules/nf-core/multiqc/environment.yml pypi

modules/nf-core/samtools/index/environment.yml pypi

modules/nf-core/samtools/sort/environment.yml pypi

modules/nf-core/samtools/stats/environment.yml pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science