iva

de novo virus assembler of Illumina paired reads

https://github.com/sanger-pathogens/iva

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Committers with academic emails
    6 of 11 committers (54.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary

Keywords

bioinformatics bioinformatics-pipeline genomics global-health infectious-diseases next-generation-sequencing pathogen research sequencing
Last synced: 6 months ago · JSON representation

Repository

de novo virus assembler of Illumina paired reads

Basic Info
Statistics
  • Stars: 56
  • Watchers: 12
  • Forks: 19
  • Open Issues: 16
  • Releases: 0
Topics
bioinformatics bioinformatics-pipeline genomics global-health infectious-diseases next-generation-sequencing pathogen research sequencing
Created almost 12 years ago · Last pushed almost 5 years ago
Metadata Files
Readme Changelog License

README.md

IVA

Iterative Virus Assembler - de novo virus assembler of Illumina paired reads.

PLEASE NOTE: we currently do not have the resources to provide support for IVA, so please do not expect a reply if you flag any issue.

Unmaintained
Build Status
License: GPL v3
status
install with bioconda
Container ready
Docker Build Status
Docker Pulls
codecov

Contents

Introduction

IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.

For more information, please read the IVA publication.

Installation

For installation instructions, please refer to the IVA website

Running the tests

The test can be run with dzil from the top level directory:

python setup.py test

Usage

``` usage: iva [options] {-f readsfwd -r readsrev | --fr reads}

positional arguments: Output directory Name of output directory (must not already exist)

optional arguments: -h, --help show this help message and exit

Input and output: -f filename[.gz], --readsfwd filename[.gz] Name of forward reads fasta/q file. Must be used in conjunction with --readsrev -r filename[.gz], --readsrev filename[.gz] Name of reverse reads fasta/q file. Must be used in conjunction with --readsfwd --fr filename[.gz] Name of interleaved fasta/q file --keep_files Keep intermediate files (could be many!). Default is to delete all unnecessary files --contigs filename[.gz] Fasta file of contigs to be extended. Incompatible with --reference --reference filename[.gz] EXPERIMENTAL! This option is EXPERIMENTAL, not recommended, and has not been tested! Fasta file of reference genome, or parts thereof. IVA will try to assemble one contig per sequence in this file. Incompatible with --contigs -v, --verbose Be verbose by printing messages to stdout. Use up to three times for increasing verbosity.

SMALT mapping options: -k INT, --smaltk INT kmer hash length in SMALT (the -k option in smalt index) [19] -s INT, --smalts INT kmer hash step size in SMALT (the -s option in smalt index) [11] -y FLOAT, --smalt_id FLOAT Minimum identity threshold for mapping to be reported (the -y option in smalt map) [0.5]

Contig options: --ctgfirsttrim INT Number of bases to trim off the end of every contig before extending for the first time [25] --ctgitertrim INT During iterative extension, number of bases to trim off the end of a contig when extension fails (then try extending again) [10] --extmincov INT Minimum kmer depth needed to use that kmer to extend a contig [10] --extminratio FLOAT Sets N, where kmer for extension must be at least N times more abundant than next most common kmer [4] --extmaxbases INT Maximum number of bases to try to extend on each iteration [100] --extminclip INT Set minimum number of bases soft clipped off a read for those bases to be used for extension [3] --max_contigs INT Maximum number of contigs allowed in the assembly. No more seeds generated if the cutoff is reached [50]

Seed generation options: --makenewseeds When no more contigs can be extended, generate a new seed. This is forced to be true when --contigs is not used --seedstartlength INT When making a seed sequence, use the most common kmer of this length. Default is to use the minimum of (median read length, 95). Warning: it is not recommended to set this higher than 95 --seedstoplength INT Stop extending seed using perfect matches from reads when this length is reached. Future extensions are then made by treating the seed as a contig [0.9*maxinsert] --seedminkmercov INT Minimum kmer coverage of initial seed [25] --seedmaxkmercov INT Maximum kmer coverage of initial seed [1000000] --seedextmaxbases INT Maximum number of bases to try to extend on each iteration [50] --seedoverlaplength INT Number of overlapping bases needed between read and seed to use that read to extend [seedstartlength] --seedextmincov INT Minimum kmer depth needed to use that kmer to extend a contig [10] --seedextminratio FLOAT Sets N, where kmer for extension must be at least N times more abundant than next most common kmer [4]

Read trimming options: --trimmomatic FILENAME Provide location of trimmomatic.jar file to enable read trimming. Required if --adapters used --trimmoqual STRING Trimmomatic options used to quality trim reads [LEADING:10 TRAILING:10 SLIDINGWINDOW:4:20] --adapters FILENAME Fasta file of adapter sequences to be trimmed off reads. If used, must also use --trimmomatic. Default is file of adapters supplied with IVA --mintrimmedlength INT Minimum length of read after trimming [50] --pcrprimers FILENAME FASTA file of primers. The first perfect match found to a sequence in the primers file will be trimmed off the start of each read. This is run after trimmomatic (if --trimmomatic used)

Other options: -i INT, --maxinsert INT Maximum insert size (includes read length). Reads with inferred insert size more than the maximum will not be used to extend contigs [800] -t INT, --threads INT Number of threads to use [1] --kmconethread Force kmc to use one thread. By default the value of -t/--threads is used when running kmc --strandbias FLOAT in [0,0.5] Set strand bias cutoff of mapped reads when trimming contig ends, in the interval [0,0.5]. A value of x means that a base needs min(fwddepth, revdepth) / totaldepth <= x. The only time this should be used is with libraries with overlapping reads (ie fragment length < 2*read length), and even then, it can make results worse. If used, try a low value like 0.1 first [0] --test Run using built in test data. All other options will be ignored, except the mandatory output directory, and --trimmomatic and --threads can be also be used --version show program's version number and exit ```

For usage help and examples, see the IVA wiki page.

License

IVA is free software, licensed under GPLv3.

Feedback/Issues

Please report any issues to the issues page.

PLEASE NOTE: we currently do not have the resources to provide support for IVA, so please do not expect a reply if you flag any issue.

Citation

If you use this software please cite:

IVA: accurate de novo assembly of RNA virus genomes.
Hunt M, Gall A, Ong SH, Brener J, Ferns B, Goulder P, Nastouli E, Keane JA, Kellam P, Otto TD.
Bioinformatics. 2015 Jul 15;31(14):2374-6. doi: 10.1093/bioinformatics/btv120. Epub 2015 Feb 28.

Adapter sequences:
Optimal enzymes for amplifying sequencing libraries.
Quail, M. a et al. Nat. Methods 9, 10-1 (2012).

GAGE:
GAGE: A critical evaluation of genome assemblies and assembly algorithms.
Salzberg, S. L. et al. Genome Res. 22, 557-67 (2012).

KMC:
Disk-based k-mer counting on a PC.
Deorowicz, S., Debudaj-Grabysz, A. & Grabowski, S. BMC Bioinformatics 14, 160 (2013).

Kraken:
Kraken: ultrafast metagenomic sequence classification using exact alignments.
Wood, D. E. & Salzberg, S. L. Genome Biol. 15, R46 (2014).

MUMmer:
Versatile and open software for comparing large genomes.
Kurtz, S. et al. Genome Biol. 5, R12 (2004).

R:
R: A language and environment for statistical computing.
R Core Team (2013). R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

RATT:
RATT: Rapid Annotation Transfer Tool.
Otto, T. D., Dillon, G. P., Degrave, W. S. & Berriman, M. Nucleic Acids Res. 39, e57 (2011).

SAMtools:
The Sequence Alignment/Map format and SAMtools.
Li, H. et al. Bioinformatics 25, 2078-9 (2009).

Trimmomatic:
Trimmomatic: A flexible trimmer for Illumina Sequence Data.
Bolger, A. M., Lohse, M. & Usadel, B. Bioinformatics 1-7 (2014).

Owner

  • Name: Pathogen Informatics, Wellcome Sanger Institute
  • Login: sanger-pathogens
  • Kind: organization
  • Location: Hinxton, Cambs., UK

GitHub Events

Total
  • Watch event: 1
  • Member event: 2
  • Pull request event: 1
  • Fork event: 1
Last Year
  • Watch event: 1
  • Member event: 2
  • Pull request event: 1
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 218
  • Total Committers: 11
  • Avg Commits per committer: 19.818
  • Development Distribution Score (DDS): 0.495
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Martin Hunt m****2@s****k 110
martinghunt m****t@g****m 71
Sara Sjunnebo s****4@s****k 8
Sascha Steinbiss s****a@s****e 8
andrewjpage a****e@g****m 7
puethe c****5@s****k 4
Olivier Seret o****7@s****k 4
donkirkby d****y@g****m 3
Gareth Peat g****6@s****k 1
Martin Aslett m****a@s****k 1
Michael R. Crusoe m****e@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 28
  • Total pull requests: 75
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 16 days
  • Total issue authors: 24
  • Total pull request authors: 11
  • Average comments per issue: 2.04
  • Average comments per pull request: 0.05
  • Merged pull requests: 72
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • george-githinji (2)
  • donkirkby (2)
  • ChrisHIV (2)
  • antoine4ucsd (2)
  • AntonioBaeza (1)
  • migrau (1)
  • eccen (1)
  • rmart300 (1)
  • sdwfrost (1)
  • lmoncla (1)
  • waqasuddinkhan (1)
  • manisenthils (1)
  • martinghunt (1)
  • nbbosa (1)
  • el-mat (1)
Pull Request Authors
  • martinghunt (50)
  • ssjunnebo (7)
  • satta (5)
  • donkirkby (3)
  • emollier (2)
  • andrewjpage (2)
  • puethe (2)
  • mr-c (1)
  • garethpeat (1)
  • aslett1 (1)
  • seretol (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 51 last-month
  • Total docker downloads: 11
  • Total dependent packages: 0
  • Total dependent repositories: 5
  • Total versions: 16
  • Total maintainers: 2
pypi.org: iva

Iterative Virus Assembler

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 5
  • Downloads: 51 Last month
  • Docker Downloads: 11
Rankings
Docker downloads count: 3.9%
Dependent repos count: 6.7%
Forks count: 8.9%
Stargazers count: 9.3%
Average: 9.9%
Dependent packages count: 10.0%
Downloads: 20.8%
Maintainers (2)
Last synced: 6 months ago

Dependencies

setup.py pypi
  • networkx *
  • packaging *
  • pyfastaq *
  • pysam *