rvfvtyping

Classification and phylogenetic lineage assignment of Rift Valley fever virus consensus genomes using the glycoprotein Gn/G2 gene found within the M-segment of the virus genome

https://github.com/ajodeh-juma/rvfvtyping

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 16 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

bioinformatics glycoprotein lineage pipeline rvfv workflow
Last synced: 6 months ago · JSON representation ·

Repository

Classification and phylogenetic lineage assignment of Rift Valley fever virus consensus genomes using the glycoprotein Gn/G2 gene found within the M-segment of the virus genome

Basic Info
  • Host: GitHub
  • Owner: ajodeh-juma
  • License: gpl-3.0
  • Language: Nextflow
  • Default Branch: master
  • Homepage:
  • Size: 94.4 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Topics
bioinformatics glycoprotein lineage pipeline rvfv workflow
Created about 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

DOI Nextflow install with bioconda Get help on Slack License: GPL v3 <!-- Docker --> <!-- GitHub Actions CI Status --> <!-- GitHub Actions Linting Status --> <!-- -->

Twitter Follow

Introduction

rvfvtyping is a bioinformatics analysis pipeline for classification and phylogenetic lineage assignment of Rift Valley fever virus consensus genomes using the glycoprotein Gn/G2 gene found within the M-segment of the virus genome.

Classifying query sequences involves two steps. The first step is the identification of the virus species and the second is the assignment of Rift Valley fever virus lineages through phylogenetic analysis. Classification of query sequences is performed using diamond while phylogenetic assignment uses iqtree, and is largely adopted from the initial pangolin method developed by ine O'Toole.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

A web application of the pipeline is hosted on a dedicated server at the University of KwaZulu Natal and can be found here

Installation

rvfvtyping runs on UNIX/LINUX systems. You will install Miniconda3 from here. Once Miniconda3 has been installed, proceed with pipeline installation

git clone https://github.com/ajodeh-juma/rvfvtyping.git cd rvfvtyping conda env create -n rvfvtyping-env -f environment.yml conda activate rvfvtyping-env

Testing

  • Optional: Test the installation on a single FASTA

    nextflow run main.nf -profile test

  • Optional: Test the installation on several FASTA sequence files

    nextflow run main.nf -profile test_full

Usage

For minimal pipeline options, use the --help flag e.g.

nextflow run main.nf --help

To see all the options, use the --show_hidden_params flag e.g.

nextflow run main.nf --help --show_hidden_params

A typical command to classify and assign lineages using the glycoprotein (Gn) classifier nextflow run main.nf \ --input 'data/test/*.fa' \ --segment Gn \ --outdir output-dir \ -work-dir work-dir \

Method details

The pipeline offers several parameters including as highlighted:

``` Input/output options --input [string] Input Fasta file for typing --segment [string] genomic segment of the virus. options are 'Gn', 'S', 'M' and 'L' --outdir [string] The output directory where the results will be saved. [default: ./results] --email [string] Email address for completion summary.

Diamond options --skip_diamond [boolean] Skip all DIAMOND BLAST against the pre-configured database.

```

mandatory parameters

| parameter | description | type | |-------------|:-------------------------------------------------------:|:---------:| | --input | Input Fasta file(s) format .fa or .fasta for typing | string | | --segment | genomic segment of the virus. Gn, S, M, L | string |

Output

Several output files will be generated including a comma-separated values file (lineages.csv) will be a csv file with taxon name and lineage assigned for each input query sequence per line

e.g.

| Query | Lineage | aLRT | UFbootstrap | Length | Ns(%) | Note | Yearfirst | Yearlast | Countries | | ----------- |:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------: | | DQ380218|G|84|70|3885|0.00|assigned (bootstrap value >= 70)|1969|1993|Senegal;CAR;Zimbabwe;Guinea| | HM587118|L|99|100|490|0.00|assigned (bootstrap value >= 70)|1963|1995|Zimbabwe;Egypt;South Africa;Kenya| | DQ380221|D|92|98|3885|0.00|assigned (bootstrap value >= 70)|1973|1973|CAR| | DQ380222|J|77|27|3885|0.00|unassigned (bootstrap value < 70)|||| | HM587045|B|89|97|490|0.00|assigned (bootstrap value >= 70)|1972|1972|Kenya| | DQ380189|L|99|100|3885|0.00|assigned (bootstrap value >= 70)|1963|1995|Zimbabwe;Egypt;South Africa;Kenya| | HM587125|O|92|98|490|0.00|assigned (bootstrap value >= 70)|1951|1951|South Africa| | HM587108|I|87|90|490|0.00|assigned (bootstrap value >= 70)|1955|1956|South Africa| | MG972973|C|88|96|3852|0.00|assigned (bootstrap value >= 70)|1976|2016|South Africa;Somalia;Uganda;Angola;Madagascar;Sudan;Zimbabwe;Mauritania;Saudi Arabia;Kenya| | AF134496|N|88|84|738|0.00|assigned (bootstrap value >= 70)|1975|1993|Senegal;Mauritania;Burkina Faso| | EU574086.1|J|74|33|1690|0.00|unassigned (bootstrap value < 70)|||| | RVFVNamibia2011MT561463NAM_2011|C|89|95|3830|0.00|assigned (bootstrap value >= 70)|1976|2016|South Africa;Somalia;Uganda;Angola;Madagascar;Sudan;Zimbabwe;Mauritania;Saudi Arabia;Kenya|

If --skip_diamond is not used, the classification file diamond_results.csv is not generated

| QueryID | Length | SubjectID | Segment | Product | PercentIdentity | Mismatches | Gaps | | ----------- |:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:| | HM587118 | 489 | YP003848705.1 |M|glycoprotein|100|0|0| |MG972973|3591|YP003848705.1|M|glycoprotein|99.3|8|0| |DQ380221|3591|YP003848705.1|M|glycoprotein|99|4|7|0| |AF134496|738|YP003848705.1|M|glycoprotein|98.8|3|0| |DQ380222|3591|YP003848705.1|M|glycoprotein|99.2|9|0| |EU574086.1|795|YP003848706.1|S|non-structural protein|97.4|7|0| |EU574086.1|735|YP003848707.1|S|nucleocapsid|99.6|1|0| |RVFVNamibia2011MT561463NAM2011|3558|YP003848705.1|M|glycoprotein|99.2|9|0| |DQ380218|3591|YP003848705.1|M|glycoprotein|99.5|6|0| |HM587108|489|YP003848705.1|M|glycoprotein|100|0|0| |DQ380189|3591|YP003848705.1|M|glycoprotein|98.9|13|0| |HM587125|489|YP003848705.1|M|glycoprotein|99.4|1|0| |HM587045|489|YP003848705.1|M|glycoprotein|100|0|0|

Web application.

The tool is also implemented as a web application at https://www.genomedetective.com/app/typingtool/rvfv/

Pipeline Summary

By default, the pipeline currently performs the following:

  • Classification of query sequence(s) (diamond)
  • Phylogenetic typing (iqtree)

Credits

rvfvtyping was originally written by John Juma.

We thank the following people for their extensive assistance in the development of this pipeline: - Vagner Fonseca - Peter Van Heusden

License

rvfvtyping is free software, licensed under GPLv3.

Issues

Please report any issues to the issues page.

Contribute

If you wish to fix a bug or add new features to the software we welcome Pull Requests. We use GitHub Flow style development. Please fork the repo, make the change, then submit a Pull Request against out master branch, with details about what the change is and what it fixes/adds. We will then review your changes and merge them, or provide feedback on enhancements.

Citations

rvfvtyping pipeline uses the following software:

Buchfink, B., Xie, C., & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12(1), 5960. https://doi.org/10.1038/nmeth.3176

Guindon, S., & Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology, 52(5), 696704. https://doi.org/10.1080/10635150390235520

Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q., & Vinh, L. S. (2018). UFBoot2: Improving the Ultrafast Bootstrap Approximation. Molecular Biology and Evolution, 35(2), 518522. https://doi.org/10.1093/molbev/msx281

Huelsenbeck, J. P., & Ronquist, F. (2001). MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8), 754755. https://doi.org/10.1093/bioinformatics/17.8.754

Katoh, K. (2002). MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30(14), 30593066. https://doi.org/10.1093/nar/gkf436

Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P. A., McWilliam, H., Valentin, F., Wallace, I. M., Wilm, A., Lopez, R., Thompson, J. D., Gibson, T. J., & Higgins, D. G. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23(21), 29472948. https://doi.org/10.1093/bioinformatics/btm404

Vilsker, M., Moosa, Y., Nooij, S., Fonseca, V., Ghysens, Y., Dumon, K., Pauwels, R., Alcantara, L. C., Vanden Eynden, E., Vandamme, A.-M., Deforche, K., & de Oliveira, T. (2019). Genome Detective: An automated system for virus identification from high-throughput sequencing data. Bioinformatics, 35(5), 871873. https://doi.org/10.1093/bioinformatics/bty695

Yu, G., Smith, D. K., Zhu, H., Guan, Y., & Lam, T. T.-Y. (2017). ggtree: An r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution, 8(1), 2836. https://doi.org/10.1111/2041-210X.12628

An imagemagick-like frontend to Biopython SeqIO seqmagick

Owner

  • Name: JJ
  • Login: ajodeh-juma
  • Kind: user
  • Location: Nairobi, KE

A biologist with interest in computational biology and bioinformatics.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Juma
    given-names: John
    orcid: https://orcid.org/0000-0002-1481-5337
title: "RVFV classification and Lineage assignment"
version: 1.0.0
doi: 10.5281/zenodo.6121759
date-released: 2022-02-16

GitHub Events

Total
Last Year