revica

A reference-based viral consensus genome assembly pipeline

https://github.com/asereewit/revica

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

A reference-based viral consensus genome assembly pipeline

Basic Info

Host: GitHub
Owner: asereewit
Language: Nextflow
Default Branch: main
Size: 288 MB

Statistics

Stars: 0
Watchers: 1
Forks: 3
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme Citation

REVICA

Revica is a reference-based viral consensus genome assembly pipeline for some of the most common respiratory viruses. Revica currently supports genome assembly of: - Enterovirus (EV) - Seasonal human coronavirus (HCOV) - Human metapneumovirus (HMPV) - Human respiratory syncytial virus (HRSV) - Human parainfluenza virus (HPIV) - Measles morbillivirus (MeV) - Influenza A virus (IAV) - Influenza B virus (IBV) - Human adenovirus (HAdV)

Workflow

Usage

Install Nextflow

Install Docker

To run Revica:

nextflow run asereewit/revica -r main -latest --input example_samplesheet.csv --output example_output -profile docker

on AWS:

nextflow run asereewit/revica -r main -latest --input example_samplesheet.csv --output example_output -profile docker -c your_nextflow_aws.config

Options

|Option|Explanation| |------|-----------| | --input | samplesheet in csv format with fastq information | | --output | output directory (default: revicaoutput) | | --db | (multi)fasta file to overwrite the bundled viral database | | `--runname| name for the summary tsv file (default: 'run') | |--skipfastqc| skip quality control using FastQC (default: false) | |--skipfastp| skip adapters and reads trimming using fastp (default: false) | |--runkraken2| run Kraken2 for classifying reads (default: false) | |--kraken2db| Kraken2 database for reads classification, needs to be specified when using--runkraken2| |--kraken2variantshostfilter| use reads that didn't map to the kraken2 database for downstream consensus calling | |--savekraken2unclassifiedreads| save reads that didn't map to the specified kraken2 database | |--savekraken2classifiedreads| save reads that map to the specified kraken2 database | |--trimlen| minimum read length to keep (default:50) | |--savetrimmedreads| save trimmed fastq | |--sample| downsample fastq to a certain fraction or number of reads | |--refminmediancov| minimum median coverage on a reference for consensus assembly (default: 3) | |--refmingenomecov| minimum reference coverage percentage for consensus assembly (default: 60%) | |--ivarconsensust| minimum frequency threshold to call consensus (default: 0.6) | |--ivarconsensusq| minimum quality score threshold to call consensus (default: 20) | |--ivarconsensus_m` | minimum depth to call consensus (default: 5) |

Usage notes

Samplesheet example: assets/samplesheet.csv
You can create a samplesheet using the bundled python script: python bin/fastq_dir_samplesheet.py fastq_dir samplesheet_name.csv
Memory and CPU usage for pipeline processes can be adjusted in conf/base.config
Process arguments can be adjusted in conf/modules.config
You can use your own reference(s) for consensus genome assembly by specifying the --db parameter followed by your fasta file.
- reference header format: >reference_accession reference_tag reference_header_info
- it's important to tag the fasta sequences for the same species or gene segments with the same name or abbreviation in the header section, otherwise the pipeline will generate a consensus genome for every reference where the median coverage of the first alignment exceed the specified threshold (default 3).
- Revica works with segmented viral genomes, just keep the different gene segments separated and tag them in the reference fasta file
If you are using Docker on Linux, check out these post-installation steps (especially cgroup swap limit capabilities support) for configuring Linux to work better with Docker.
By default, Docker has full access to full RAM and CPU resources of the host, but if you are using MacOS, go to Settings -> Resources in Docker Desktop to make sure enough resources are allocated to docker containers.

Contact

For bug reports please email aseree@uw.edu or raise an issue on Github.

Owner

Login: asereewit
Kind: user

Repositories: 1
Profile: https://github.com/asereewit

Citation (CITATION.cff)

message: "If you use this software, please cite it as below."
title: "Revica"
abstract: Revica is a reference-based viral consensus genome assembly pipeline
authors:
  - family-names: Sereewit
    given-names: Jaydee
    orcid: https://orcid.org/0000-0002-7937-6398
  - family-names: Greninger
    given-names: Alexander 
    orcid: https://orcid.org/0000-0002-7443-0527
date-released: 2022-04-26
repository-code: https://github.com/greninger-lab/revica

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science