revica

A reference-based viral consensus genome assembly pipeline

https://github.com/asereewit/revica

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A reference-based viral consensus genome assembly pipeline

Basic Info
  • Host: GitHub
  • Owner: asereewit
  • Language: Nextflow
  • Default Branch: main
  • Size: 288 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

README.md

REVICA

Revica is a reference-based viral consensus genome assembly pipeline for some of the most common respiratory viruses. Revica currently supports genome assembly of: - Enterovirus (EV) - Seasonal human coronavirus (HCOV) - Human metapneumovirus (HMPV) - Human respiratory syncytial virus (HRSV) - Human parainfluenza virus (HPIV) - Measles morbillivirus (MeV) - Influenza A virus (IAV) - Influenza B virus (IBV) - Human adenovirus (HAdV)

Workflow

Workflow

Usage

Install Nextflow

Install Docker

To run Revica:

nextflow run asereewit/revica -r main -latest --input example_samplesheet.csv --output example_output -profile docker

on AWS:

nextflow run asereewit/revica -r main -latest --input example_samplesheet.csv --output example_output -profile docker -c your_nextflow_aws.config

Options

|Option|Explanation| |------|-----------| | --input | samplesheet in csv format with fastq information | | --output | output directory (default: revicaoutput) | | --db | (multi)fasta file to overwrite the bundled viral database | | `--runname| name for the summary tsv file (default: 'run') | |--skipfastqc| skip quality control using FastQC (default: false) | |--skipfastp| skip adapters and reads trimming using fastp (default: false) | |--runkraken2| run Kraken2 for classifying reads (default: false) | |--kraken2db| Kraken2 database for reads classification, needs to be specified when using--runkraken2| |--kraken2variantshostfilter| use reads that didn't map to the kraken2 database for downstream consensus calling | |--savekraken2unclassifiedreads| save reads that didn't map to the specified kraken2 database | |--savekraken2classifiedreads| save reads that map to the specified kraken2 database | |--trimlen| minimum read length to keep (default:50) | |--savetrimmedreads| save trimmed fastq | |--sample| downsample fastq to a certain fraction or number of reads | |--refminmediancov| minimum median coverage on a reference for consensus assembly (default: 3) | |--refmingenomecov| minimum reference coverage percentage for consensus assembly (default: 60%) | |--ivarconsensust| minimum frequency threshold to call consensus (default: 0.6) | |--ivarconsensusq| minimum quality score threshold to call consensus (default: 20) | |--ivarconsensus_m` | minimum depth to call consensus (default: 5) |

Usage notes

  • Samplesheet example: assets/samplesheet.csv
  • You can create a samplesheet using the bundled python script: python bin/fastq_dir_samplesheet.py fastq_dir samplesheet_name.csv
  • Memory and CPU usage for pipeline processes can be adjusted in conf/base.config
  • Process arguments can be adjusted in conf/modules.config
  • You can use your own reference(s) for consensus genome assembly by specifying the --db parameter followed by your fasta file.
    • reference header format: >reference_accession reference_tag reference_header_info
    • it's important to tag the fasta sequences for the same species or gene segments with the same name or abbreviation in the header section, otherwise the pipeline will generate a consensus genome for every reference where the median coverage of the first alignment exceed the specified threshold (default 3).
    • Revica works with segmented viral genomes, just keep the different gene segments separated and tag them in the reference fasta file
  • If you are using Docker on Linux, check out these post-installation steps (especially cgroup swap limit capabilities support) for configuring Linux to work better with Docker.
  • By default, Docker has full access to full RAM and CPU resources of the host, but if you are using MacOS, go to Settings -> Resources in Docker Desktop to make sure enough resources are allocated to docker containers.

Contact

For bug reports please email aseree@uw.edu or raise an issue on Github.

Owner

  • Login: asereewit
  • Kind: user

Citation (CITATION.cff)

message: "If you use this software, please cite it as below."
title: "Revica"
abstract: Revica is a reference-based viral consensus genome assembly pipeline
authors:
  - family-names: Sereewit
    given-names: Jaydee
    orcid: https://orcid.org/0000-0002-7937-6398
  - family-names: Greninger
    given-names: Alexander 
    orcid: https://orcid.org/0000-0002-7443-0527
date-released: 2022-04-26
repository-code: https://github.com/greninger-lab/revica

GitHub Events

Total
Last Year