iva

de novo virus assembler of Illumina paired reads

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: nature.com
✓
Committers with academic emails
6 of 11 committers (54.5%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Keywords

bioinformatics bioinformatics-pipeline genomics global-health infectious-diseases next-generation-sequencing pathogen research sequencing

Last synced: 6 months ago · JSON representation

Repository

de novo virus assembler of Illumina paired reads

Basic Info

Host: GitHub
Owner: sanger-pathogens
License: other
Language: Python
Default Branch: master
Homepage: http://sanger-pathogens.github.io/iva/
Size: 8.46 MB

Statistics

Stars: 56
Watchers: 12
Forks: 19
Open Issues: 16
Releases: 0

Topics

bioinformatics bioinformatics-pipeline genomics global-health infectious-diseases next-generation-sequencing pathogen research sequencing

Created almost 12 years ago · Last pushed almost 5 years ago

Metadata Files

Readme Changelog License

IVA

Iterative Virus Assembler - de novo virus assembler of Illumina paired reads.

PLEASE NOTE: we currently do not have the resources to provide support for IVA, so please do not expect a reply if you flag any issue.

Introduction
Installation
Running the tests
Usage
License
Feedback/Issues
Citation

Introduction

IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.

For more information, please read the IVA publication.

Installation

For installation instructions, please refer to the IVA website

Running the tests

The test can be run with dzil from the top level directory:

python setup.py test

Usage

``` usage: iva [options] {-f readsfwd -r readsrev | --fr reads}

positional arguments: Output directory Name of output directory (must not already exist)

optional arguments: -h, --help show this help message and exit

Input and output: -f filename[.gz], --readsfwd filename[.gz] Name of forward reads fasta/q file. Must be used in conjunction with --readsrev -r filename[.gz], --readsrev filename[.gz] Name of reverse reads fasta/q file. Must be used in conjunction with --readsfwd --fr filename[.gz] Name of interleaved fasta/q file --keep_files Keep intermediate files (could be many!). Default is to delete all unnecessary files --contigs filename[.gz] Fasta file of contigs to be extended. Incompatible with --reference --reference filename[.gz] EXPERIMENTAL! This option is EXPERIMENTAL, not recommended, and has not been tested! Fasta file of reference genome, or parts thereof. IVA will try to assemble one contig per sequence in this file. Incompatible with --contigs -v, --verbose Be verbose by printing messages to stdout. Use up to three times for increasing verbosity.

SMALT mapping options: -k INT, --smaltk INT kmer hash length in SMALT (the -k option in smalt index) [19] -s INT, --smalts INT kmer hash step size in SMALT (the -s option in smalt index) [11] -y FLOAT, --smalt_id FLOAT Minimum identity threshold for mapping to be reported (the -y option in smalt map) [0.5]

Contig options: --ctgfirsttrim INT Number of bases to trim off the end of every contig before extending for the first time [25] --ctgitertrim INT During iterative extension, number of bases to trim off the end of a contig when extension fails (then try extending again) [10] --extmincov INT Minimum kmer depth needed to use that kmer to extend a contig [10] --extminratio FLOAT Sets N, where kmer for extension must be at least N times more abundant than next most common kmer [4] --extmaxbases INT Maximum number of bases to try to extend on each iteration [100] --extminclip INT Set minimum number of bases soft clipped off a read for those bases to be used for extension [3] --max_contigs INT Maximum number of contigs allowed in the assembly. No more seeds generated if the cutoff is reached [50]

Seed generation options: --makenewseeds When no more contigs can be extended, generate a new seed. This is forced to be true when --contigs is not used --seedstartlength INT When making a seed sequence, use the most common kmer of this length. Default is to use the minimum of (median read length, 95). Warning: it is not recommended to set this higher than 95 --seedstoplength INT Stop extending seed using perfect matches from reads when this length is reached. Future extensions are then made by treating the seed as a contig [0.9*maxinsert] --seedminkmercov INT Minimum kmer coverage of initial seed [25] --seedmaxkmercov INT Maximum kmer coverage of initial seed [1000000] --seedextmaxbases INT Maximum number of bases to try to extend on each iteration [50] --seedoverlaplength INT Number of overlapping bases needed between read and seed to use that read to extend [seedstartlength] --seedextmincov INT Minimum kmer depth needed to use that kmer to extend a contig [10] --seedextminratio FLOAT Sets N, where kmer for extension must be at least N times more abundant than next most common kmer [4]

Read trimming options: --trimmomatic FILENAME Provide location of trimmomatic.jar file to enable read trimming. Required if --adapters used --trimmoqual STRING Trimmomatic options used to quality trim reads [LEADING:10 TRAILING:10 SLIDINGWINDOW:4:20] --adapters FILENAME Fasta file of adapter sequences to be trimmed off reads. If used, must also use --trimmomatic. Default is file of adapters supplied with IVA --mintrimmedlength INT Minimum length of read after trimming [50] --pcrprimers FILENAME FASTA file of primers. The first perfect match found to a sequence in the primers file will be trimmed off the start of each read. This is run after trimmomatic (if --trimmomatic used)

Other options: -i INT, --maxinsert INT Maximum insert size (includes read length). Reads with inferred insert size more than the maximum will not be used to extend contigs [800] -t INT, --threads INT Number of threads to use [1] --kmconethread Force kmc to use one thread. By default the value of -t/--threads is used when running kmc --strandbias FLOAT in [0,0.5] Set strand bias cutoff of mapped reads when trimming contig ends, in the interval [0,0.5]. A value of x means that a base needs min(fwddepth, revdepth) / totaldepth <= x. The only time this should be used is with libraries with overlapping reads (ie fragment length < 2*read length), and even then, it can make results worse. If used, try a low value like 0.1 first [0] --test Run using built in test data. All other options will be ignored, except the mandatory output directory, and --trimmomatic and --threads can be also be used --version show program's version number and exit ```

For usage help and examples, see the IVA wiki page.

License

IVA is free software, licensed under GPLv3.

Feedback/Issues

Please report any issues to the issues page.

PLEASE NOTE: we currently do not have the resources to provide support for IVA, so please do not expect a reply if you flag any issue.

Citation

If you use this software please cite:

IVA: accurate de novo assembly of RNA virus genomes.
Hunt M, Gall A, Ong SH, Brener J, Ferns B, Goulder P, Nastouli E, Keane JA, Kellam P, Otto TD.
Bioinformatics. 2015 Jul 15;31(14):2374-6. doi: 10.1093/bioinformatics/btv120. Epub 2015 Feb 28.

Adapter sequences:
Optimal enzymes for amplifying sequencing libraries.
Quail, M. a et al. Nat. Methods 9, 10-1 (2012).

GAGE:
GAGE: A critical evaluation of genome assemblies and assembly algorithms.
Salzberg, S. L. et al. Genome Res. 22, 557-67 (2012).

KMC:
Disk-based k-mer counting on a PC.
Deorowicz, S., Debudaj-Grabysz, A. & Grabowski, S. BMC Bioinformatics 14, 160 (2013).

Kraken:
Kraken: ultrafast metagenomic sequence classification using exact alignments.
Wood, D. E. & Salzberg, S. L. Genome Biol. 15, R46 (2014).

MUMmer:
Versatile and open software for comparing large genomes.
Kurtz, S. et al. Genome Biol. 5, R12 (2004).

R:
R: A language and environment for statistical computing.
R Core Team (2013). R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

RATT:
RATT: Rapid Annotation Transfer Tool.
Otto, T. D., Dillon, G. P., Degrave, W. S. & Berriman, M. Nucleic Acids Res. 39, e57 (2011).

SAMtools:
The Sequence Alignment/Map format and SAMtools.
Li, H. et al. Bioinformatics 25, 2078-9 (2009).

Trimmomatic:
Trimmomatic: A flexible trimmer for Illumina Sequence Data.
Bolger, A. M., Lohse, M. & Usadel, B. Bioinformatics 1-7 (2014).

Owner

Name: Pathogen Informatics, Wellcome Sanger Institute
Login: sanger-pathogens
Kind: organization
Location: Hinxton, Cambs., UK

Website: http://www.sanger.ac.uk/science/groups/pathogen-informatics
Repositories: 54
Profile: https://github.com/sanger-pathogens

GitHub Events

Total

Watch event: 1
Member event: 2
Pull request event: 1
Fork event: 1

Last Year

Watch event: 1
Member event: 2
Pull request event: 1
Fork event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 218
Total Committers: 11
Avg Commits per committer: 19.818
Development Distribution Score (DDS): 0.495

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Martin Hunt	m**2@s**k	110
martinghunt	m**t@g**m	71
Sara Sjunnebo	s**4@s**k	8
Sascha Steinbiss	s**a@s**e	8
andrewjpage	a**e@g**m	7
puethe	c**5@s**k	4
Olivier Seret	o**7@s**k	4
donkirkby	d**y@g**m	3
Gareth Peat	g**6@s**k	1
Martin Aslett	m**a@s**k	1
Michael R. Crusoe	m**e@g**m	1

Committer Domains (Top 20 + Academic)

sanger.ac.uk: 6 steinbiss.name: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 28
Total pull requests: 75
Average time to close issues: about 2 months
Average time to close pull requests: 16 days
Total issue authors: 24
Total pull request authors: 11
Average comments per issue: 2.04
Average comments per pull request: 0.05
Merged pull requests: 72
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

george-githinji (2)
donkirkby (2)
ChrisHIV (2)
antoine4ucsd (2)
AntonioBaeza (1)
migrau (1)
eccen (1)
rmart300 (1)
sdwfrost (1)
lmoncla (1)
waqasuddinkhan (1)
manisenthils (1)
martinghunt (1)
nbbosa (1)
el-mat (1)

Pull Request Authors

martinghunt (50)
ssjunnebo (7)
satta (5)
donkirkby (3)
emollier (2)
andrewjpage (2)
puethe (2)
mr-c (1)
garethpeat (1)
aslett1 (1)
seretol (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 51 last-month
Total docker downloads: 11

Total dependent packages: 0
Total dependent repositories: 5
Total versions: 16
Total maintainers: 2

pypi.org: iva

Iterative Virus Assembler

Homepage: https://github.com/sanger-pathogens/iva
Documentation: https://iva.readthedocs.io/
License: GPLv3
Latest release: 1.0.9
published about 8 years ago

Versions: 16
Dependent Packages: 0
Dependent Repositories: 5
Downloads: 51 Last month
Docker Downloads: 11

Rankings

Docker downloads count: 3.9%

Dependent repos count: 6.7%

Forks count: 8.9%

Stargazers count: 9.3%

Average: 9.9%

Dependent packages count: 10.0%

Downloads: 20.8%

Maintainers (2)

sanger-pathogens martinghunt

Last synced: 6 months ago

Dependencies

setup.py pypi

networkx *
packaging *
pyfastaq *
pysam *

iva

Science Score: 33.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

IVA

Contents

Introduction

Installation

Running the tests

Usage

License

Feedback/Issues

Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: iva

Rankings

Maintainers (2)

Dependencies