Science Score: 33.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: nature.com -
✓Committers with academic emails
6 of 11 committers (54.5%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Keywords
Repository
de novo virus assembler of Illumina paired reads
Basic Info
- Host: GitHub
- Owner: sanger-pathogens
- License: other
- Language: Python
- Default Branch: master
- Homepage: http://sanger-pathogens.github.io/iva/
- Size: 8.46 MB
Statistics
- Stars: 56
- Watchers: 12
- Forks: 19
- Open Issues: 16
- Releases: 0
Topics
Metadata Files
README.md
IVA
Iterative Virus Assembler - de novo virus assembler of Illumina paired reads.
PLEASE NOTE: we currently do not have the resources to provide support for IVA, so please do not expect a reply if you flag any issue.
Contents
Introduction
IVA is a de novo assembler designed to assemble virus genomes that have no repeat sequences, using Illumina read pairs sequenced from mixed populations at extremely high and variable depth.
For more information, please read the IVA publication.
Installation
For installation instructions, please refer to the IVA website
Running the tests
The test can be run with dzil from the top level directory:
python setup.py test
Usage
``` usage: iva [options] {-f readsfwd -r readsrev | --fr reads}
positional arguments: Output directory Name of output directory (must not already exist)
optional arguments: -h, --help show this help message and exit
Input and output: -f filename[.gz], --readsfwd filename[.gz] Name of forward reads fasta/q file. Must be used in conjunction with --readsrev -r filename[.gz], --readsrev filename[.gz] Name of reverse reads fasta/q file. Must be used in conjunction with --readsfwd --fr filename[.gz] Name of interleaved fasta/q file --keep_files Keep intermediate files (could be many!). Default is to delete all unnecessary files --contigs filename[.gz] Fasta file of contigs to be extended. Incompatible with --reference --reference filename[.gz] EXPERIMENTAL! This option is EXPERIMENTAL, not recommended, and has not been tested! Fasta file of reference genome, or parts thereof. IVA will try to assemble one contig per sequence in this file. Incompatible with --contigs -v, --verbose Be verbose by printing messages to stdout. Use up to three times for increasing verbosity.
SMALT mapping options: -k INT, --smaltk INT kmer hash length in SMALT (the -k option in smalt index) [19] -s INT, --smalts INT kmer hash step size in SMALT (the -s option in smalt index) [11] -y FLOAT, --smalt_id FLOAT Minimum identity threshold for mapping to be reported (the -y option in smalt map) [0.5]
Contig options: --ctgfirsttrim INT Number of bases to trim off the end of every contig before extending for the first time [25] --ctgitertrim INT During iterative extension, number of bases to trim off the end of a contig when extension fails (then try extending again) [10] --extmincov INT Minimum kmer depth needed to use that kmer to extend a contig [10] --extminratio FLOAT Sets N, where kmer for extension must be at least N times more abundant than next most common kmer [4] --extmaxbases INT Maximum number of bases to try to extend on each iteration [100] --extminclip INT Set minimum number of bases soft clipped off a read for those bases to be used for extension [3] --max_contigs INT Maximum number of contigs allowed in the assembly. No more seeds generated if the cutoff is reached [50]
Seed generation options: --makenewseeds When no more contigs can be extended, generate a new seed. This is forced to be true when --contigs is not used --seedstartlength INT When making a seed sequence, use the most common kmer of this length. Default is to use the minimum of (median read length, 95). Warning: it is not recommended to set this higher than 95 --seedstoplength INT Stop extending seed using perfect matches from reads when this length is reached. Future extensions are then made by treating the seed as a contig [0.9*maxinsert] --seedminkmercov INT Minimum kmer coverage of initial seed [25] --seedmaxkmercov INT Maximum kmer coverage of initial seed [1000000] --seedextmaxbases INT Maximum number of bases to try to extend on each iteration [50] --seedoverlaplength INT Number of overlapping bases needed between read and seed to use that read to extend [seedstartlength] --seedextmincov INT Minimum kmer depth needed to use that kmer to extend a contig [10] --seedextminratio FLOAT Sets N, where kmer for extension must be at least N times more abundant than next most common kmer [4]
Read trimming options: --trimmomatic FILENAME Provide location of trimmomatic.jar file to enable read trimming. Required if --adapters used --trimmoqual STRING Trimmomatic options used to quality trim reads [LEADING:10 TRAILING:10 SLIDINGWINDOW:4:20] --adapters FILENAME Fasta file of adapter sequences to be trimmed off reads. If used, must also use --trimmomatic. Default is file of adapters supplied with IVA --mintrimmedlength INT Minimum length of read after trimming [50] --pcrprimers FILENAME FASTA file of primers. The first perfect match found to a sequence in the primers file will be trimmed off the start of each read. This is run after trimmomatic (if --trimmomatic used)
Other options: -i INT, --maxinsert INT Maximum insert size (includes read length). Reads with inferred insert size more than the maximum will not be used to extend contigs [800] -t INT, --threads INT Number of threads to use [1] --kmconethread Force kmc to use one thread. By default the value of -t/--threads is used when running kmc --strandbias FLOAT in [0,0.5] Set strand bias cutoff of mapped reads when trimming contig ends, in the interval [0,0.5]. A value of x means that a base needs min(fwddepth, revdepth) / totaldepth <= x. The only time this should be used is with libraries with overlapping reads (ie fragment length < 2*read length), and even then, it can make results worse. If used, try a low value like 0.1 first [0] --test Run using built in test data. All other options will be ignored, except the mandatory output directory, and --trimmomatic and --threads can be also be used --version show program's version number and exit ```
For usage help and examples, see the IVA wiki page.
License
IVA is free software, licensed under GPLv3.
Feedback/Issues
Please report any issues to the issues page.
PLEASE NOTE: we currently do not have the resources to provide support for IVA, so please do not expect a reply if you flag any issue.
Citation
If you use this software please cite:
IVA: accurate de novo assembly of RNA virus genomes.
Hunt M, Gall A, Ong SH, Brener J, Ferns B, Goulder P, Nastouli E, Keane JA, Kellam P, Otto TD.
Bioinformatics. 2015 Jul 15;31(14):2374-6. doi: 10.1093/bioinformatics/btv120. Epub 2015 Feb 28.
Adapter sequences:
Optimal enzymes for amplifying sequencing libraries.
Quail, M. a et al. Nat. Methods 9, 10-1 (2012).
GAGE:
GAGE: A critical evaluation of genome assemblies and assembly algorithms.
Salzberg, S. L. et al. Genome Res. 22, 557-67 (2012).
KMC:
Disk-based k-mer counting on a PC.
Deorowicz, S., Debudaj-Grabysz, A. & Grabowski, S. BMC Bioinformatics 14, 160 (2013).
Kraken:
Kraken: ultrafast metagenomic sequence classification using exact alignments.
Wood, D. E. & Salzberg, S. L. Genome Biol. 15, R46 (2014).
MUMmer:
Versatile and open software for comparing large genomes.
Kurtz, S. et al. Genome Biol. 5, R12 (2004).
R:
R: A language and environment for statistical computing.
R Core Team (2013). R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
RATT:
RATT: Rapid Annotation Transfer Tool.
Otto, T. D., Dillon, G. P., Degrave, W. S. & Berriman, M. Nucleic Acids Res. 39, e57 (2011).
SAMtools:
The Sequence Alignment/Map format and SAMtools.
Li, H. et al. Bioinformatics 25, 2078-9 (2009).
Trimmomatic:
Trimmomatic: A flexible trimmer for Illumina Sequence Data.
Bolger, A. M., Lohse, M. & Usadel, B. Bioinformatics 1-7 (2014).
Owner
- Name: Pathogen Informatics, Wellcome Sanger Institute
- Login: sanger-pathogens
- Kind: organization
- Location: Hinxton, Cambs., UK
- Website: http://www.sanger.ac.uk/science/groups/pathogen-informatics
- Repositories: 54
- Profile: https://github.com/sanger-pathogens
GitHub Events
Total
- Watch event: 1
- Member event: 2
- Pull request event: 1
- Fork event: 1
Last Year
- Watch event: 1
- Member event: 2
- Pull request event: 1
- Fork event: 1
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martin Hunt | m****2@s****k | 110 |
| martinghunt | m****t@g****m | 71 |
| Sara Sjunnebo | s****4@s****k | 8 |
| Sascha Steinbiss | s****a@s****e | 8 |
| andrewjpage | a****e@g****m | 7 |
| puethe | c****5@s****k | 4 |
| Olivier Seret | o****7@s****k | 4 |
| donkirkby | d****y@g****m | 3 |
| Gareth Peat | g****6@s****k | 1 |
| Martin Aslett | m****a@s****k | 1 |
| Michael R. Crusoe | m****e@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 28
- Total pull requests: 75
- Average time to close issues: about 2 months
- Average time to close pull requests: 16 days
- Total issue authors: 24
- Total pull request authors: 11
- Average comments per issue: 2.04
- Average comments per pull request: 0.05
- Merged pull requests: 72
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- george-githinji (2)
- donkirkby (2)
- ChrisHIV (2)
- antoine4ucsd (2)
- AntonioBaeza (1)
- migrau (1)
- eccen (1)
- rmart300 (1)
- sdwfrost (1)
- lmoncla (1)
- waqasuddinkhan (1)
- manisenthils (1)
- martinghunt (1)
- nbbosa (1)
- el-mat (1)
Pull Request Authors
- martinghunt (50)
- ssjunnebo (7)
- satta (5)
- donkirkby (3)
- emollier (2)
- andrewjpage (2)
- puethe (2)
- mr-c (1)
- garethpeat (1)
- aslett1 (1)
- seretol (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 51 last-month
- Total docker downloads: 11
- Total dependent packages: 0
- Total dependent repositories: 5
- Total versions: 16
- Total maintainers: 2
pypi.org: iva
Iterative Virus Assembler
- Homepage: https://github.com/sanger-pathogens/iva
- Documentation: https://iva.readthedocs.io/
- License: GPLv3
-
Latest release: 1.0.9
published about 8 years ago
Rankings
Maintainers (2)
Dependencies
- networkx *
- packaging *
- pyfastaq *
- pysam *