svtyper

Bayesian genotyper for structural variants

https://github.com/hall-lab/svtyper

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Committers with academic emails
    6 of 10 committers (60.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

bioinformatics genomics genotype vcf

Keywords from Contributors

structural-variation
Last synced: 6 months ago · JSON representation

Repository

Bayesian genotyper for structural variants

Basic Info
  • Host: GitHub
  • Owner: hall-lab
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 2.21 MB
Statistics
  • Stars: 134
  • Watchers: 9
  • Forks: 56
  • Open Issues: 49
  • Releases: 0
Topics
bioinformatics genomics genotype vcf
Created over 11 years ago · Last pushed almost 5 years ago
Metadata Files
Readme License

README.md

SVTyper

GitHub license Build Status

Bayesian genotyper for structural variants

Overview

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data. Users must supply a VCF file of sites to genotype (which may be generated by LUMPY) as well as a BAM/CRAM file of Illumina paired-end reads aligned with BWA-MEM. SVTyper assesses discordant and concordant reads from paired-end and split-read alignments to infer genotypes at each site. Algorithm details and benchmarking are described in Chiang et al., 2015.

NA12878 heterozygous deletion

Installation

Requirements: - Python 2.7.x

Install via pip

pip install git+https://github.com/hall-lab/svtyper.git

svtyper depends on pysam (version 0.15.0 or newer), numpy, and scipy; svtyper-sso additionally depends on cytoolz. If the dependencies aren't already available on your system, pip will attempt to download and install them.

svtyper vs svtyper-sso

svtyper is the original implementation of the genotyping algorithm, and works with multiple samples. svtyper-sso is an alternative implementation of svtyper that is optimized for genotyping a single sample. svtyper-sso is a parallelized implementation of svtyper that takes advantage of multiple CPU cores via the multiprocessing module. svtyper-sso can offer a 2x or more speedup (depending on how many CPU cores used) in genotyping a single sample. NOTE: svtyper-sso is not yet stable. There are minor logging differences between the two and svtyper-sso may exit with an error prematurely when processing CRAM files.

Example Usage

svtyper

As a Command Line Python Script

bash svtyper \ -i sv.vcf \ -B sample.bam \ -l sample.bam.json \ > sv.gt.vcf

As a Python Library

```python import svtyper.classic as svt

inputvcf = "/path/to/input.vcf" inputbam = "/path/to/input.bam" libraryinfo = "/path/to/libraryinfo.json" output_vcf = "/path/to/output.vcf"

with open(inputvcf, "r") as inf, open(outputvcf, "w") as outf: svt.svgenotype(bamstring=inputbam, vcfin=inf, vcfout=outf, minaligned=20, splitweight=1, discweight=1, numsamp=1000000, libinfopath=libraryinfo, debug=False, alignmentoutpath=None, reffasta=None, sumquals=False, maxreads=None)

Results will be inside the /path/to/output.vcf file

```

svtyper-sso

As a Command Line Python Script

bash svtyper-sso \ --core 2 # number of cpu cores to use \ --batch_size 1000 # number of SVs to process in a single batch (default: 1000) \ --max_reads 1000 # skip genotyping if SV contains valid reads greater than this threshold (default: 1000) \ -i sv.vcf \ -B sample.bam \ -l sample.bam.json \ > sv.gt.vcf

As a Python Library

```python import svtyper.singlesample as sso

inputvcf = "/path/to/input.vcf" inputbam = "/path/to/input.bam" libraryinfo = "/path/to/libraryinfo.json" output_vcf = "/path/to/output.vcf"

with open(inputvcf, "r") as inf, open(outputvcf, "w") as outf: sso.ssogenotype(bamstring=inputbam, vcfin=inf, vcfout=outf, minaligned=20, splitweight=1, discweight=1, numsamp=1000000, libinfopath=libraryinfo, debug=False, alignmentoutpath=None, reffasta=None, sumquals=False, maxreads=1000, cores=2, batch_size=1000)

Results will be inside the /path/to/output.vcf file

```

Development

Requirements: - Python 2.7 or newer - GNU Make - virtualenv (or conda for anaconda or miniconda users)

Setting Up a Development Environment

Using virtualenv

git clone https://github.com/hall-lab/svtyper.git
cd svtyper
virtualenv myvenv
source myvenv/bin/activate
pip install -e .
<add, edit, or delete code>
make test

# when you're finished with development
git push <remote-name> <branch>
deactivate
cd .. && rm -rf svtyper

Using conda

git clone https://github.com/hall-lab/svtyper.git
cd svtyper
conda create --channel bioconda --name mycenv pysam numpy scipy cytoolz # type 'y' when prompted with "proceed ([y]/n)?"
source activate mycenv
pip install -e .
<add, edit, or delete code>
make test


# when you're finished with development
git push <remote-name> <branch>
source deactivate
cd .. && rm -rf svtyper
conda remove --name mycenv --all

Troubleshooting

Many common issues are related to abnormal insert size distributions in the BAM file. SVTyper provides methods to assess and visualize the characteristics of sequencing libraries.

Running SVTyper with the -l flag creates a JSON file with essential metrics on a BAM file. SVTyper will sample the first N reads for the file (1 million by default) to parse the libraries, read groups, and insert size histograms. This can be done in the absence of a VCF file. svtyper \ -B my.bam \ -l my.bam.json

The lib_stats.R script produces insert size histograms from the JSON file scripts/lib_stats.R my.bam.json my.bam.json.pdf Insert size histogram

Citation

C Chiang, R M Layer, G G Faust, M R Lindberg, D B Rose, E P Garrison, G T Marth, A R Quinlan, and I M Hall. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Meth 12, 966–968 (2015). doi:10.1038/nmeth.3505.

http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3505.html

Owner

  • Name: Ira Hall lab
  • Login: hall-lab
  • Kind: organization
  • Location: St. Louis, Missouri

GitHub Events

Total
  • Watch event: 9
  • Issue comment event: 3
  • Fork event: 1
Last Year
  • Watch event: 9
  • Issue comment event: 3
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 515
  • Total Committers: 10
  • Avg Commits per committer: 51.5
  • Development Distribution Score (DDS): 0.437
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Colby Chiang c****e@v****u 290
indraniel i****l@g****m 91
Indraniel Das i****s@w****u 56
Dave Larson d****n@g****u 54
Colby Chiang c****g@w****u 10
Brent Pedersen b****e@g****m 5
apregier a****r@g****u 4
Dave Larson d****n@w****u 3
Spencer Kelley s****y@g****m 1
chapmanb c****b@5****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 70
  • Total pull requests: 34
  • Average time to close issues: 3 months
  • Average time to close pull requests: 13 days
  • Total issue authors: 57
  • Total pull request authors: 10
  • Average comments per issue: 1.76
  • Average comments per pull request: 1.12
  • Merged pull requests: 27
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • cc2qe (4)
  • brentp (3)
  • verne91 (3)
  • MehmetGoktay (2)
  • ikkextoch (2)
  • ernfrid (2)
  • zeeev (2)
  • fpbarthel (2)
  • pingxinxing (2)
  • aj03 (1)
  • mlinderm (1)
  • tyyiyi (1)
  • moldach (1)
  • 54tuifeimo (1)
  • biozzq (1)
Pull Request Authors
  • ernfrid (12)
  • cc2qe (8)
  • brentp (7)
  • indraniel (1)
  • anju24 (1)
  • apregier (1)
  • mchowdh200 (1)
  • florealcab (1)
  • ikkextoch (1)
  • avakel (1)
Top Labels
Issue Labels
enhancement (2) bug (1)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 123 last-month
  • Total docker downloads: 429
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 3
    (may contain duplicates)
  • Total versions: 21
  • Total maintainers: 3
proxy.golang.org: github.com/hall-lab/svtyper
  • Versions: 17
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.5%
Average: 5.7%
Dependent repos count: 5.8%
Last synced: 6 months ago
pypi.org: svtyper

Bayesian genotyper for structural variants

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 123 Last month
  • Docker Downloads: 429
Rankings
Docker downloads count: 1.2%
Forks count: 5.7%
Stargazers count: 6.6%
Average: 7.8%
Dependent repos count: 9.0%
Dependent packages count: 10.0%
Downloads: 14.1%
Maintainers (3)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • cytoolz *
  • numpy *
  • pysam *
  • scipy *
setup.py pypi
  • cytoolz >=0.8.2
  • numpy *
  • pysam >=0.15.0
  • scipy *