v-met

A bare-bones, ridiculously simple metagenomics pipeline for viruses using Kraken 2 written in Nextflow.

https://github.com/ksumngs/v-met

Science Score: 28.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary

Keywords

bioinformatics kraken2 metagenomics nextflow pipeline viral
Last synced: 9 months ago · JSON representation ·

Repository

A bare-bones, ridiculously simple metagenomics pipeline for viruses using Kraken 2 written in Nextflow.

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 1
  • Open Issues: 2
  • Releases: 0
Topics
bioinformatics kraken2 metagenomics nextflow pipeline viral
Created about 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

logo

v-met

Testing Documentation Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. Nextflow GitHub tag (latest by date) GitHub license

A bare-bones, ridiculously simple metagenomics pipeline for viruses using Kraken 2 written in Nextflow.

This project follows the semver pro forma and uses the git-flow branching model.

Installation

  1. Install Nextflow
  2. Install [Conda]
  3. Install one or more of
  4. Download a Kraken2 database
  5. Download a BLAST database

Check out the Installation docs for a more nuanced take on the requirements.

Usage

Syntax

bash nextflow run ksumngs/v-met \ -profile <singularity,podman,docker> \ --platform <illumina,nanopore> \ --kraken2_db /path/to/kraken2/database \ --blast_db /path/to/blast/database \ [--input /path/to/reads/folder] \ [--blast_target list] \ [--outdir /path/to/output]

Example: Illumina reads with a relatively complete Kraken2 database

bash nextflow run ksumngs/yavsap \ -profile singularity \ --platform illumina \ --kraken2_db /databases/kraken2/nt \ --blast_target 'none'

Example: Nanopore reads with a viral-only Kraken2 database

bash nextflow run ksumngs/yavsap \ -profile podman \ --platform nanopore \ --kraken2_db /databases/kraken2/viral \ --blast_db /databases/blast/ \ --blast_target classified

There are way more parameters than listed here. For a more complete description, please read the docs on Usage and Parameters.

Owner

  • Name: KSU Molecular NGS Lab
  • Login: ksumngs
  • Kind: organization

The Molecular Next-Generation Sequencing lab at the Kansas State University College of Veterinary Medicine

Citation (CITATIONS.md)

# v-met: Citations

This pipeline uses code and infrastructure developed and maintained by the
[nf-core](https://nf-co.re) community, reused here under the [MIT
license](https://github.com/nf-core/tools/blob/master/LICENSE).

> The nf-core framework for community-curated bioinformatics pipelines.
>
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes
> Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

In addition, references of tools and data used in this pipeline are as follows:

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow
> enables reproducible computational workflows. Nat Biotechnol. 2017 Apr
> 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [BLAST](https://www.ncbi.nlm.nih.gov/pubmed/20003500/)

  > Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden
  > TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009 Dec
  > 15;10:421. doi: 10.1186/1471-2105-10-421. PubMed PMID: 20003500; PubMed
  > Central PMCID: PMC2803857.

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

- [Kraken 2](https://www.ncbi.nlm.nih.gov/pubmed/31779668/)

  > Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2.
  > Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0. PubMed
  > PMID: 31779668; PubMed Central PMCID: PMC6883579.

- [Krona](https://www.ncbi.nlm.nih.gov/pubmed/21961884/)

  > Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization
  > in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. doi:
  > 10.1186/1471-2105-12-385. PMID: 21961884; PMCID: PMC3190407.

- [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis
  > results for multiple tools and samples in a single report. Bioinformatics.
  > 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016
  > Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [NanoFilt](https://www.ncbi.nlm.nih.gov/pubmed/29547981/)

  > De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack:
  > visualizing and processing long-read sequencing data. Bioinformatics. 2018
  > Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149. PMID:
  > 29547981; PMCID: PMC6061794.

- [NanoStat](https://www.ncbi.nlm.nih.gov/pubmed/29547981/)

  > De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack:
  > visualizing and processing long-read sequencing data. Bioinformatics. 2018
  > Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149. PMID:
  > 29547981; PMCID: PMC6061794.

- [seqkit](https://www.ncbi.nlm.nih.gov/pubmed/27706213/)

  > Shen W, Le S, Li Y, Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit
  > for FASTA/Q File Manipulation. PLoS One. 2016 Oct 5;11(10):e0163962. doi:
  > 10.1371/journal.pone.0163962. PMID: 27706213; PMCID: PMC5051824.

- [seqtk](https://github.com/lh3/seqtk)

- [Trimmomatic](https://www.ncbi.nlm.nih.gov/pubmed/24695404/)
  > Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina
  > sequence data. Bioinformatics. 2014 Aug 1;30(15):2114-20. doi:
  > 10.1093/bioinformatics/btu170. Epub 2014 Apr 1. PMID: 24695404; PMCID:
  > PMC4103590.

## Software packaging/containerization tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0.
  > Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH,
  > Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and
  > comprehensive software distribution for the life sciences. Nat Methods.
  > 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes
  > H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T,
  > Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y.
  > BioContainers: an open-source and community-driven framework for software
  > standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi:
  > 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central
  > PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for
  > mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi:
  > 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014;
  > PubMed Central PMCID: PMC5426675.

GitHub Events

Total
Last Year

Dependencies

docs/requirements.txt pypi
  • myst-parser ==0.17.2
  • nf-core ==2.4
  • sphinx ==4.5
  • sphinx-multiversion ==0.2.4
  • sphinx_rtd_theme ==1.0