vcfstats

Powerful statistics for VCF files

https://github.com/pwwang/vcfstats

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.5%) to scientific vocabulary

Keywords

vcf vcf-files vcfstats
Last synced: 6 months ago · JSON representation

Repository

Powerful statistics for VCF files

Basic Info
Statistics
  • Stars: 70
  • Watchers: 5
  • Forks: 15
  • Open Issues: 1
  • Releases: 10
Topics
vcf vcf-files vcfstats
Created over 6 years ago · Last pushed 8 months ago
Metadata Files
Readme License

README.md

vcfstats - powerful statistics for VCF files

Pypi Github PythonVers docs github action Codacy Codacy coverage

Documentation | CHANGELOG

Motivation

There are a couple of tools that can plot some statistics of VCF files, including bcftools and jvarkit. However, none of them could:

  1. plot specific metrics
  2. customize the plots
  3. focus on variants with certain filters

R package vcfR can do some of the above. However, it has to load entire VCF into memory, which is not friendly to large VCF files.

Installation

shell pip install -U vcfstats

Or run with docker:

shell docker run \ -w /vcfstats/workdir \ -v $(pwd):/vcfstats/workdir \ --rm justold/vcfstats:latest \ vcfstats \ --vcf myfile.vcf \ -o outputs \ --formula 'COUNT(1) ~ CONTIG' \ --title 'Number of variants on each chromosome'

Gallery

Number of variants on each chromosome

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1) ~ CONTIG' \ --title 'Number of variants on each chromosome' \ --config examples/config.toml

Number of variants on each chromosome

Changing labels and ticks

vcfstats uses plotnine for plotting, read more about it on how to specify --ggs to modify the plots.

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1) ~ CONTIG' \ --title 'Number of variants on each chromosome (modified)' \ --config examples/config.toml \ --ggs 'scale_x_discrete(name ="Chromosome", \ limits=["1","2","3","4","5","6","7","8","9","10","X"]); \ ylab("# Variants")'

Number of variants on each chromosome (modified)

Number of variants on first 5 chromosome

```shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1) ~ CONTIG[1,2,3,4,5]' \ --title 'Number of variants on each chromosome (first 5)' \ --config examples/config.toml

or

vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1) ~ CONTIG[1-5]' \ --title 'Number of variants on each chromosome (first 5)' \ --config examples/config.toml

or

require vcf file to be tabix-indexed.

vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1) ~ CONTIG' \ --title 'Number of variants on each chromosome (first 5)' \ --config examples/config.toml -r 1 2 3 4 5 ```

Number of variants on each chromosome (first 5)

Number of substitutions of SNPs

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1, VARTYPE[snp]) ~ SUBST[A>T,A>G,A>C,T>A,T>G,T>C,G>A,G>T,G>C,C>A,C>T,C>G]' \ --title 'Number of substitutions of SNPs' \ --config examples/config.toml

Number of substitutions of SNPs

Only with SNPs PASS all filters

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1, VARTYPE[snp]) ~ SUBST[A>T,A>G,A>C,T>A,T>G,T>C,G>A,G>T,G>C,C>A,C>T,C>G]' \ --title 'Number of substitutions of SNPs (passed)' \ --config examples/config.toml \ --passed

Number of substitutions of SNPs (passed)

Alternative allele frequency on each chromosome

```shell

using a dark theme

vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'AAF ~ CONTIG' \ --title 'Allele frequency on each chromosome' \ --config examples/config.toml --ggs 'theme_dark()' ```

Allele frequency on each chromosome

Using boxplot

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'AAF ~ CONTIG' \ --title 'Allele frequency on each chromosome (boxplot)' \ --config examples/config.toml \ --figtype boxplot

Allele frequency on each chromosome

Using density plot/histogram to investigate the distribution:

You can plot the distribution, using density plot or histogram

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'AAF ~ CONTIG[1,2]' \ --title 'Allele frequency on chromosome 1,2' \ --config examples/config.toml \ --figtype density

Allele frequency on chromosome 1,2

Overall distribution of allele frequency

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'AAF ~ 1' \ --title 'Overall allele frequency distribution' \ --config examples/config.toml

Overall allele frequency distribution

Excluding some low/high frequency variants

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'AAF[0.05, 0.95] ~ 1' \ --title 'Overall allele frequency distribution (0.05-0.95)' \ --config examples/config.toml

Overall allele frequency distribution

Counting types of variants on each chromosome

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1, group=VARTYPE) ~ CHROM' \ # or simply # --formula 'VARTYPE ~ CHROM' \ --title 'Types of variants on each chromosome' \ --config examples/config.toml

Types of variants on each chromosome

Using bar chart if there is only one chromosome

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1, group=VARTYPE) ~ CHROM[1]' \ # or simply # --formula 'VARTYPE ~ CHROM[1]' \ --title 'Types of variants on chromosome 1' \ --config examples/config.toml \ --figtype pie

Types of variants on chromosome 1

Counting variant types on whole genome

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ # or simply # --formula 'VARTYPE ~ 1' \ --formula 'COUNT(1, group=VARTYPE) ~ 1' \ --title 'Types of variants on whole genome' \ --config examples/config.toml

Types of variants on whole genome

Counting type of mutant genotypes (HET, HOM_ALT) for sample 1 on each chromosome

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ # or simply # --formula 'GTTYPEs[HET,HOM_ALT]{0} ~ CHROM' \ --formula 'COUNT(1, group=GTTYPEs[HET,HOM_ALT]{0}) ~ CHROM' \ --title 'Mutant genotypes on each chromosome (sample 1)' \ --config examples/config.toml

Mutant genotypes on each chromosome

Exploration of mean(genotype quality) and mean(depth) on each chromosome for sample 1

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'MEAN(GQs{0}) ~ MEAN(DEPTHs{0}, group=CHROM)' \ --title 'GQ vs depth (sample 1)' \ --config examples/config.toml

GQ vs depth (sample 1)

Exploration of depths for sample 1,2

shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'DEPTHs{0} ~ DEPTHs{1}' \ --title 'Depths between sample 1 and 2' \ --config examples/config.toml

Depths between sample 1 and 2

See more examples:

https://github.com/pwwang/vcfstats/issues/15#issuecomment-1029367903

Owner

  • Login: pwwang
  • Kind: user

GitHub Events

Total
  • Issues event: 5
  • Watch event: 2
  • Issue comment event: 7
  • Push event: 15
  • Fork event: 1
Last Year
  • Issues event: 5
  • Watch event: 2
  • Issue comment event: 7
  • Push event: 15
  • Fork event: 1

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 50
  • Total Committers: 2
  • Avg Commits per committer: 25.0
  • Development Distribution Score (DDS): 0.02
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
pwwang p****g@p****m 49
gnxsf 1****f 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 29
  • Total pull requests: 12
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 7 days
  • Total issue authors: 19
  • Total pull request authors: 4
  • Average comments per issue: 5.97
  • Average comments per pull request: 0.17
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • DenisGoryunov (4)
  • kkmll (3)
  • gnxsf (3)
  • plrlhb12 (2)
  • gk7279 (2)
  • ghost (2)
  • mbaddoo (1)
  • FriederikeHanssen (1)
  • hojin9218 (1)
  • Jokendo-collab (1)
  • mglgc (1)
  • pwwang (1)
  • aliakeefe (1)
  • miguellarraz (1)
  • dtabb73 (1)
Pull Request Authors
  • pwwang (9)
  • dependabot[bot] (1)
  • cipherome-minkim (1)
Top Labels
Issue Labels
question (4) enhancement (3) documentation (1) bug (1) good first issue (1)
Pull Request Labels
dependencies (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 167 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 16
  • Total maintainers: 1
pypi.org: vcfstats

Powerful statistics for VCF files

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 167 Last month
Rankings
Forks count: 9.1%
Downloads: 9.3%
Stargazers count: 9.5%
Dependent packages count: 10.0%
Average: 11.9%
Dependent repos count: 21.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • mkapi *
  • mkdocs-material *
  • pymdown-extensions *
poetry.lock pypi
  • atomicwrites 1.4.0 develop
  • attrs 21.4.0 develop
  • cmdy 0.5.0 develop
  • coverage 6.4.1 develop
  • curio 1.5 develop
  • iniconfig 1.1.1 develop
  • pluggy 1.0.0 develop
  • pytest 7.1.2 develop
  • pytest-cov 3.0.0 develop
  • asttokens 2.0.5
  • click 8.1.3
  • colorama 0.4.5
  • coloredlogs 15.0.1
  • commonmark 0.9.1
  • cycler 0.11.0
  • cyvcf2 0.30.15
  • datar 0.8.5
  • descartes 1.1.0
  • diot 0.1.6
  • executing 0.8.3
  • fonttools 4.33.3
  • humanfriendly 10.0
  • inflection 0.5.1
  • kiwisolver 1.4.3
  • lark-parser 0.12.0
  • matplotlib 3.5.2
  • mizani 0.7.4
  • numpy 1.23.0
  • packaging 21.3
  • palettable 3.3.0
  • pandas 1.4.3
  • patsy 0.5.2
  • pillow 9.1.1
  • pipda 0.6.0
  • plotnine 0.8.0
  • plotnine-prism 0.0.0
  • pure-eval 0.2.2
  • py 1.11.0
  • pygments 2.12.0
  • pyparam 0.5.3
  • pyparsing 3.0.9
  • pyreadline3 3.4.1
  • python-dateutil 2.8.2
  • python-simpleconf 0.5.6
  • python-slugify 6.1.2
  • pytz 2022.1
  • rich 12.4.4
  • rtoml 0.8.0
  • scipy 1.6.1
  • setuptools-scm 7.0.2
  • six 1.16.0
  • statsmodels 0.13.2
  • text-unidecode 1.3
  • toml 0.10.2
  • tomli 2.0.1
  • typing-extensions 4.2.0
  • varname 0.8.3
pyproject.toml pypi
  • cmdy ^0.5 develop
  • pytest ^7 develop
  • pytest-cov ^3 develop
  • cyvcf2 0.*
  • datar ^0.8
  • lark-parser ^0.12
  • numpy ^1.22
  • plotnine ^0.8
  • plotnine-prism ^0.0
  • py ^1.10
  • pyparam ^0.5
  • python ^3.8
  • python-slugify ^6
  • rich ^12
.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • actions/checkout v2 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • codacy/codacy-coverage-reporter-action master composite
  • docker/build-push-action v3 composite
  • docker/login-action v2 composite
  • docker/setup-buildx-action v2 composite
  • docker/setup-qemu-action v2 composite
.github/workflows/docs.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • ad-m/github-push-action master composite
Dockerfile docker
  • python 3.9.12-slim-buster build
setup.py pypi