insilicoseq

:rocket: A sequencing simulator

https://github.com/hadrieng/insilicoseq

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.0%) to scientific vocabulary

Keywords

bioinformatics illumina-sequencing metagenomics sequencing simulation

Keywords from Contributors

annotation taxonomic-classification
Last synced: 6 months ago · JSON representation

Repository

:rocket: A sequencing simulator

Basic Info
Statistics
  • Stars: 207
  • Watchers: 11
  • Forks: 37
  • Open Issues: 31
  • Releases: 39
Topics
bioinformatics illumina-sequencing metagenomics sequencing simulation
Created over 9 years ago · Last pushed 8 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md

InSilicoSeq

A sequencing simulator

Build Status Documentation Status PyPI version codecov doi LICENSE

InSilicoSeq is a sequencing simulator producing realistic Illumina reads. Primarily intended for simulating metagenomic samples, it can also be used to produce sequencing data from a single genome.

InSilicoSeq is written in python, and use kernel density estimators to model the read quality of real sequencing data.

InSilicoSeq support substitution, insertion and deletion errors. If you don't have the use for insertion and deletion error a basic error model is provided.

Installation

Insilicoseq is Available in bioconda.

To install with conda:

shell conda install -c bioconda insilicoseq

Or with pip:

shell pip install InSilicoSeq

Note: Insilicoseq requires python >= 3.5

Alternatively, with docker:

shell docker pull quay.io/biocontainers/insilicoseq:2.0.0--pyh7cba7a3_0

For more installation options, please refer to the full documentation

Usage

InSilicoSeq has two subcommands: iss generate to generate Illumina reads and iss model to create an error model from which the reads will take their characteristics.

InSilicoSeq comes with pre-computed error models that should be sufficient for most use cases.

Generate reads with a pre-computed error model

for generating 1 million reads modelling a MiSeq instrument:

shell curl -O -J -L https://osf.io/thser/download # download the example data iss generate --genomes SRS121011.fasta --model miseq --output miseq_reads

where genomes.fasta should be replaced by a (multi-)fasta file containing the reference genome(s) from which the simulated reads will be generated.

InSilicoSeq comes with 3 error models: MiSeq, HiSeq and NovaSeq.

If you have built your own model, pass the .npz file to the --model argument to simulate reads from your own error model.

For 10 million reads and a custom error model:

shell curl -O -J -L https://osf.io/thser/download # download the example data iss generate -g SRS121011.fasta -n 10m --model my_model.npz --output /path/to/my_reads

granted you have built my_model.npz with iss model (see below)

For more examples and a full list of options, please refer to the full documentation

Generate reads without input genomes

We can download some for you! InSilicoSeq can download random genomes from the ncbi using the infamous eutils

The command

shell iss generate --ncbi bacteria -u 10 --model MiSeq --output ncbi_reads

will generate 1 million reads from 10 random bacterial genomes.

For more examples and a full list of options, please refer to the full documentation

Create your own error model

If you do not wish to use the pre-computed error models provided with InSilicoSeq, it is possible to create your own.

Say you have a reference metagenome called genomes.fasta, and read pairs reads_R1.fastq.gz and reads_R2.fastq.gz

Align you reads against the reference:

shell bowtie2-build genomes.fasta genomes bowtie2 -x genomes -1 reads_R1.fastq.gz -2 reads_R2.fastq.gz | \ samtools view -bS | samtools sort -o genomes.bam samtools index genomes.bam

then build the model:

shell iss model -b genomes.bam -o genomes

which will create a genome.npz file containing your newly built model

License

Code is under the MIT license.

Issues

Found a bug or have a question? Please open an issue

Contributing

We welcome contributions from the community! See our Contributing guidelines

Citation

If you use our software, please cite us!

Gourlé H, Karlsson-Lindsjö O, Hayer J and Bongcam+Rudloff E, Simulating Illumina data with InSilicoSeq. Bioinformatics (2018) doi:10.1093/bioinformatics/bty630

Owner

  • Name: Hadrien Gourlé
  • Login: HadrienG
  • Kind: user
  • Location: Sweden
  • Company: Folkhalsomyndigheten

Bioinformatician, focusing on machine learning and building software for public health.

GitHub Events

Total
  • Issues event: 5
  • Watch event: 17
  • Delete event: 2
  • Issue comment event: 5
  • Pull request event: 5
  • Fork event: 5
  • Create event: 2
Last Year
  • Issues event: 5
  • Watch event: 17
  • Delete event: 2
  • Issue comment event: 5
  • Pull request event: 5
  • Fork event: 5
  • Create event: 2

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 452
  • Total Committers: 9
  • Avg Commits per committer: 50.222
  • Development Distribution Score (DDS): 0.128
Past Year
  • Commits: 11
  • Committers: 1
  • Avg Commits per committer: 11.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Hadrien Gourlé g****n@g****m 394
dependabot-preview[bot] 2****] 31
Hadrien Gourlé h****e@u****e 11
dependabot[bot] s****t@d****m 8
Étienne Mollier e****r@m****g 4
Richel Bilderbeek r****k 1
Daniel Standage d****e@n****v 1
alienzj a****j@g****m 1
Hadrien Gourlé H****G 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 80
  • Total pull requests: 64
  • Average time to close issues: 9 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 68
  • Total pull request authors: 12
  • Average comments per issue: 2.13
  • Average comments per pull request: 1.14
  • Merged pull requests: 38
  • Bot issues: 0
  • Bot pull requests: 33
Past Year
  • Issues: 8
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: 9 months
  • Issue authors: 8
  • Pull request authors: 2
  • Average comments per issue: 1.13
  • Average comments per pull request: 0.5
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 2
Top Authors
Issue Authors
  • HadrienG (4)
  • gregoruar (3)
  • izaak-coleman (2)
  • dpellow (2)
  • Tj-Idowu (2)
  • slvrshot (2)
  • Naturalist1986 (2)
  • lmanchon (2)
  • sergioSEa (2)
  • vr1087 (1)
  • cimendes (1)
  • bramvandijk88 (1)
  • dturaev (1)
  • Shruteek (1)
  • till-bornemann (1)
Pull Request Authors
  • dependabot-preview[bot] (26)
  • HadrienG (17)
  • dependabot[bot] (13)
  • ThijsMaas (12)
  • StefanLelieveld (4)
  • sebschmi (2)
  • emollier (1)
  • richelbilderbeek (1)
  • pabviana (1)
  • apcamargo (1)
  • tessad (1)
Top Labels
Issue Labels
bug (13) question (11) need answer (10) enhancement (8) on hold (4) doc (2) dependencies (1)
Pull Request Labels
dependencies (39) python (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 262 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 4
  • Total versions: 39
  • Total maintainers: 1
pypi.org: insilicoseq

a sequencing simulator

  • Versions: 39
  • Dependent Packages: 0
  • Dependent Repositories: 4
  • Downloads: 262 Last month
Rankings
Stargazers count: 5.8%
Forks count: 7.1%
Dependent repos count: 7.5%
Average: 8.7%
Dependent packages count: 10.1%
Downloads: 13.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/pythonpackage.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
  • codecov/codecov-action v1.0.2 composite
Dockerfile docker
  • python 3 build
Pipfile pypi
  • codecov * develop
  • nose * develop
  • pep8 * develop
  • pycodestyle * develop
  • biopython ==1.78
  • future *
  • joblib *
  • numpy *
  • pysam ==0.15.4
  • requests *
  • scipy *
  • urllib3 >=1.26.5
Pipfile.lock pypi
  • certifi ==2021.5.30 develop
  • charset-normalizer ==2.0.4 develop
  • codecov ==2.1.12 develop
  • coverage ==5.5 develop
  • idna ==3.2 develop
  • nose ==1.3.7 develop
  • pep8 ==1.7.1 develop
  • pycodestyle ==2.7.0 develop
  • requests ==2.26.0 develop
  • urllib3 ==1.26.6 develop
  • biopython ==1.78
  • certifi ==2021.5.30
  • charset-normalizer ==2.0.4
  • future ==0.18.2
  • idna ==3.2
  • joblib ==1.0.1
  • numpy ==1.21.2
  • pysam ==0.15.4
  • requests ==2.26.0
  • scipy ==1.7.1
  • urllib3 ==1.26.6
doc/requirements.txt pypi
  • cython *
  • pip *
setup.py pypi
  • biopython <=1.78
  • future *
  • joblib *
  • numpy *
  • pysam >=0.15.1
  • requests *
  • scipy *