nf-virontus
Oxford Nanopore reference mapping, taxonomic classification, de novo assembly workflow primarily for viral sequence data
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary
Repository
Oxford Nanopore reference mapping, taxonomic classification, de novo assembly workflow primarily for viral sequence data
Basic Info
Statistics
- Stars: 6
- Watchers: 4
- Forks: 3
- Open Issues: 2
- Releases: 5
Metadata Files
README.md
CFIA-NCFAD/nf-virontus
Oxford Nanopore viral sequence analysis pipeline.
Introduction
This pipeline performs read mapping and variant calling with Minimap2 and [Clair3]. A consensus sequence is generated from major variants and variants that would not cause potential frameshift mutations using Bcftools with masking of low coverage depth regions with N characters.
Optionally, amplicon primers can be trimmed with iVar if a BED file of primer coordinates is supplied.
If read mapping against the SARS-CoV-2 reference genome Wuhan-Hu-1 (MN908947.3), Pangolin lineage assignment will be done, as well as, Nextclade analysis.
Pipeline Overview
NOTE: This pipeline is still a work-in-progress. The following diagram shows the planned features and steps of the workflow:
```mermaid flowchart LR classDef input fill:#fcba64,color:black classDef output fill:#b1fc9c,color:black subgraph legend["Legend"] style legend fill:white,fill-opacity:0.5 input([Input]):::input output([Output]):::output process[Process] end
subgraph rqc["<b>fa:fa-dna Reads QA/QC</b>"]
A([fa:fa-file Filtered Reads FASTQ]):::output
FR["fa:fa-check Read QC & Filtering <br><small> fastp, nanoqc"]
HR["fa:fa-cancel Dehosting <br><small> Kraken2 (optional)"]
RR([fa:fa-file Raw Reads FASTQ]):::input --> FR
FR --> HR
HR --> A
end
rqc --> rs
subgraph rs[<b>fa:fa-crosshairs Reference Selection]
RS["fa:fa-filter Ref Seq Selection <br><small> de novo assembly & BLAST, Mash, Kraken2"]
frs_rs([fa:fa-file Reads FASTQ]):::input
frs_rs --> RS
R([fa:fa-file Ref Seqs FASTA]):::input --> RS
RS --> T([fa:fa-file Top Ref Seq FASTA]):::output
end
rs --> rma
rqc --> rma
subgraph rma[<b>fa:fa-industry Reference Mapped Assembly]
direction TB
trs([fa:fa-file Top Ref FASTA]):::input
fr([fa:fa-file FASTQ]):::input
RM["fa:fa-bars-staggered Read Mapping <br><small> Minimap2"]
PT["fa:fa-scissors Primer Trimming <br><small> iVar (optional)"]
VC["fa:fa-code-compare Variant Calling <br><small> Clair3, Medaka"]
VE["fa:fa-flask Variant Effect <br><small> SnpEff, SnpSift (optional)"]
BAM([fa:fa-file BAM]):::output
D["fa:fa-chart-area Coverage Stats <br><small> Mosdepth, Samtools"]
CS["fa:fa-code-merge Make Consensus Sequence <br><small> Bcftools"]
vcf([fa:fa-file VCF]):::output
muts([fa:fa-table Amino Acid Mutations]):::output
covbed([fa:fa-file Coverage BED]):::output
fr --> RM
trs --> RM
trs --> VC
RM --> BAM
BAM --> PT
PT --> BAM
BAM --> D
BAM --> VC
vcf --> VE
VE --> vcf
VE --> muts
VC --> vcf
vcf --> CS
trs --> CS
D --> covbed
covbed --> CS
CS --> csf([fa:fa-file Consensus Sequence FASTA]):::output
end
rqc --> reporting
rs --> reporting
rma --> reporting
subgraph reporting[<b>fa:fa-clipboard Reporting & Visualization]
MQC[fa:fa-stethoscope MultiQC]
MQCR([fa:fa-file MultiQC HTML Report]):::output
CP[fa:fa-chart-area Seq Coverage Plots]
png([fa:fa-image PNG]):::output
pdf([fa:fa-file PDF]):::output
MQC --> MQCR
CP --> png & pdf
end
```
Installation
You will need to install Nextflow in order to run the Virontus pipeline.
NB: Singularity or Docker is recommended for portable and reproducible execution of the pipeline with the
-profile singularityor-profile dockercommand-line argument.
1) Install Nextflow
If you have Conda installed, you can install Nextflow with the following command:
bash
conda install -c bioconda -c conda-forge nextflow
2) Install Docker and/or Singularity
Installing Docker and/or Singularity is optional but recommended for portability and reproducibility of results.
3) Install Virontus
Nextflow will automatically download the latest version of Virontus. You can show the Virontus help message with usage information with:
bash
nextflow run CFIA-NCFAD/nf-virontus --help
Usage
Basic usage for mapping to SARS-CoV-2 reference genome MN908947.3 and ARTIC V3 protocol primers:
bash
nextflow run CFIA-NCFAD/nf-virontus \
--input samplesheet.csv \
--genome MN908947.3 \
--bed artic-ncov2019/primer_schemes/nCoV-2019/V3/nCoV-2019.bed
Can be simplified with:
bash
nextflow run CFIA-NCFAD/nf-virontus \
--input samplesheet.csv \
--scov2 \
--artic_v3
# or `--freed` for Freed et al (2020) 1200bp amplicon method
Show usage information with
bash
nextflow run CFIA-NCFAD/nf-virontus --help
NB: See the usage docs for more info.
Output
See the output docs for more info.
Credits
CFIA-NCFAD/nf-virontus was originally written by Peter Kruczkiewicz.
Bootstrapped with nf-core/tools nf-core create.
Thank you to the nf-core/tools team for a great tool for bootstrapping creation of a production ready Nextflow workflows.
Owner
- Name: CFIA NCFAD - Genomics Unit
- Login: CFIA-NCFAD
- Kind: organization
Canadian Food Inspection Agency National Centre for Foreign Animal Disease
Citation (CITATIONS.md)
# peterk87/nf-virontus: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools * [BCFtools](https://www.ncbi.nlm.nih.gov/pubmed/21903627/) > Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8. PubMed PMID: 21903627; PubMed Central PMCID: PMC3198575. * [iVar](https://www.ncbi.nlm.nih.gov/pubmed/30621750/) > Grubaugh ND, Gangavarapu K, Quick J, Matteson NL, De Jesus JG, Main BJ, Tan AL, Paul LM, Brackney DE, Grewal S, Gurfield N, Van Rompay KKA, Isern S, Michael SF, Coffey LL, Loman NJ, Andersen KG. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019 Jan 8;20(1):8. doi: 10.1186/s13059-018-1618-7. PubMed PMID: 30621750; PubMed Central PMCID: PMC6325816. * [IQ-TREE](http://www.iqtree.org/) > B.Q. Minh, H.A. Schmidt, O. Chernomor, D. Schrempf, M.D. Woodhams, A. von Haeseler, R. Lanfear (2020) IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol., 37:1530-1534. doi: [10.1093/molbev/msaa015](https://doi.org/10.1093/molbev/msaa015) > L.-T. Nguyen, H.A. Schmidt, A. von Haeseler, B.Q. Minh (2015) IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.. Mol. Biol. Evol., 32:268-274. doi: [10.1093/molbev/msu300](https://doi.org/10.1093/molbev/msu300) * [Kraken 2](https://www.ncbi.nlm.nih.gov/pubmed/31779668/) > Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0. PubMed PMID: 31779668; PubMed Central PMCID: PMC6883579. * [Longshot](https://www.nature.com/articles/s41467-019-12493-y) > Edge, P. and Bansal, V., 2019. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nature communications, 10(1), pp.1-10. doi: [10.1038/s41467-019-12493-y](https://doi.org/10.1038/s41467-019-12493-y) * [MAFFT](https://mafft.cbrc.jp/alignment/software/) > Katoh K and Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution, 30:772-780. doi: [10.1093/molbev/mst010](https://doi.org/10.1093/molbev/mst010) * [Medaka](https://github.com/nanoporetech/medaka) * [Minimap2](https://www.ncbi.nlm.nih.gov/pubmed/29750242/) > Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191. PubMed PMID: 29750242; PubMed Central PMCID: PMC6137996. * [mosdepth](https://www.ncbi.nlm.nih.gov/pubmed/29096012) > Pedersen BS, Quinlan AR. Mosdepth: Quick Coverage Calculation for Genomes and Exomes. Bioinformatics. 2018 Mar 1;34(5):867-868. doi: 10.1093/bioinformatics/btx699. PMID: 29096012 PMCID: PMC6030888. * [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. * [pangolin](https://github.com/cov-lineages/pangolin) > Áine O'Toole, Emily Scher, Anthony Underwood, Ben Jackson, Verity Hill, JT McCrone, Chris Ruis, Khali Abu-Dahab, Ben Taylor, Corin Yeats, Louis du Plessis, David Aanensen, Eddie Holmes, Oliver Pybus, Andrew Rambaut. pangolin: lineage assignment in an emerging pandemic as an epidemiological tool. Publication in preparation. * [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/) > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. * [SnpEff](https://www.ncbi.nlm.nih.gov/pubmed/22728672/) > Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012 Apr-Jun;6(2):80-92. doi: 10.4161/fly.19695. PubMed PMID: 22728672; PubMed Central PMCID: PMC3679285. * [SnpSift](https://www.ncbi.nlm.nih.gov/pubmed/22435069/) > Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM, Lu X. Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift. Front Genet. 2012 Mar 15;3:35. doi: 10.3389/fgene.2012.00035. eCollection 2012. PubMed PMID: 22435069; PubMed Central PMCID: PMC3304048. ## Software packaging/containerisation tools * [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. * [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. * [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. * [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) * [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Dependencies
- actions/checkout v2 composite
- nf-core/setup-nextflow v1 composite
- actions/checkout v2 composite
- actions/checkout v1 composite
- actions/setup-node v1 composite
- mshick/add-pr-comment v1 composite
- dawidd6/action-download-artifact v2 composite
- marocchino/sticky-pull-request-comment v2 composite