Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.5%) to scientific vocabulary
Keywords
Repository
Powerful statistics for VCF files
Basic Info
- Host: GitHub
- Owner: pwwang
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://pwwang.github.io/vcfstats/
- Size: 2.03 MB
Statistics
- Stars: 70
- Watchers: 5
- Forks: 15
- Open Issues: 1
- Releases: 10
Topics
Metadata Files
README.md
vcfstats - powerful statistics for VCF files
Motivation
There are a couple of tools that can plot some statistics of VCF files, including bcftools and jvarkit. However, none of them could:
- plot specific metrics
- customize the plots
- focus on variants with certain filters
R package vcfR can do some of the above. However, it has to load entire VCF into memory, which is not friendly to large VCF files.
Installation
shell
pip install -U vcfstats
Or run with docker:
shell
docker run \
-w /vcfstats/workdir \
-v $(pwd):/vcfstats/workdir \
--rm justold/vcfstats:latest \
vcfstats \
--vcf myfile.vcf \
-o outputs \
--formula 'COUNT(1) ~ CONTIG' \
--title 'Number of variants on each chromosome'
Gallery
Number of variants on each chromosome
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1) ~ CONTIG' \
--title 'Number of variants on each chromosome' \
--config examples/config.toml

Changing labels and ticks
vcfstats uses plotnine for plotting, read more about it on how to specify --ggs to modify the plots.
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1) ~ CONTIG' \
--title 'Number of variants on each chromosome (modified)' \
--config examples/config.toml \
--ggs 'scale_x_discrete(name ="Chromosome", \
limits=["1","2","3","4","5","6","7","8","9","10","X"]); \
ylab("# Variants")'

Number of variants on first 5 chromosome
```shell vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1) ~ CONTIG[1,2,3,4,5]' \ --title 'Number of variants on each chromosome (first 5)' \ --config examples/config.toml
or
vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1) ~ CONTIG[1-5]' \ --title 'Number of variants on each chromosome (first 5)' \ --config examples/config.toml
or
require vcf file to be tabix-indexed.
vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'COUNT(1) ~ CONTIG' \ --title 'Number of variants on each chromosome (first 5)' \ --config examples/config.toml -r 1 2 3 4 5 ```

Number of substitutions of SNPs
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1, VARTYPE[snp]) ~ SUBST[A>T,A>G,A>C,T>A,T>G,T>C,G>A,G>T,G>C,C>A,C>T,C>G]' \
--title 'Number of substitutions of SNPs' \
--config examples/config.toml

Only with SNPs PASS all filters
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1, VARTYPE[snp]) ~ SUBST[A>T,A>G,A>C,T>A,T>G,T>C,G>A,G>T,G>C,C>A,C>T,C>G]' \
--title 'Number of substitutions of SNPs (passed)' \
--config examples/config.toml \
--passed

Alternative allele frequency on each chromosome
```shell
using a dark theme
vcfstats --vcf examples/sample.vcf \ --outdir examples/ \ --formula 'AAF ~ CONTIG' \ --title 'Allele frequency on each chromosome' \ --config examples/config.toml --ggs 'theme_dark()' ```

Using boxplot
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'AAF ~ CONTIG' \
--title 'Allele frequency on each chromosome (boxplot)' \
--config examples/config.toml \
--figtype boxplot

Using density plot/histogram to investigate the distribution:
You can plot the distribution, using density plot or histogram
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'AAF ~ CONTIG[1,2]' \
--title 'Allele frequency on chromosome 1,2' \
--config examples/config.toml \
--figtype density

Overall distribution of allele frequency
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'AAF ~ 1' \
--title 'Overall allele frequency distribution' \
--config examples/config.toml

Excluding some low/high frequency variants
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'AAF[0.05, 0.95] ~ 1' \
--title 'Overall allele frequency distribution (0.05-0.95)' \
--config examples/config.toml

Counting types of variants on each chromosome
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1, group=VARTYPE) ~ CHROM' \
# or simply
# --formula 'VARTYPE ~ CHROM' \
--title 'Types of variants on each chromosome' \
--config examples/config.toml

Using bar chart if there is only one chromosome
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'COUNT(1, group=VARTYPE) ~ CHROM[1]' \
# or simply
# --formula 'VARTYPE ~ CHROM[1]' \
--title 'Types of variants on chromosome 1' \
--config examples/config.toml \
--figtype pie

Counting variant types on whole genome
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
# or simply
# --formula 'VARTYPE ~ 1' \
--formula 'COUNT(1, group=VARTYPE) ~ 1' \
--title 'Types of variants on whole genome' \
--config examples/config.toml

Counting type of mutant genotypes (HET, HOM_ALT) for sample 1 on each chromosome
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
# or simply
# --formula 'GTTYPEs[HET,HOM_ALT]{0} ~ CHROM' \
--formula 'COUNT(1, group=GTTYPEs[HET,HOM_ALT]{0}) ~ CHROM' \
--title 'Mutant genotypes on each chromosome (sample 1)' \
--config examples/config.toml

Exploration of mean(genotype quality) and mean(depth) on each chromosome for sample 1
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'MEAN(GQs{0}) ~ MEAN(DEPTHs{0}, group=CHROM)' \
--title 'GQ vs depth (sample 1)' \
--config examples/config.toml

Exploration of depths for sample 1,2
shell
vcfstats --vcf examples/sample.vcf \
--outdir examples/ \
--formula 'DEPTHs{0} ~ DEPTHs{1}' \
--title 'Depths between sample 1 and 2' \
--config examples/config.toml

See more examples:
https://github.com/pwwang/vcfstats/issues/15#issuecomment-1029367903
Owner
- Login: pwwang
- Kind: user
- Repositories: 108
- Profile: https://github.com/pwwang
GitHub Events
Total
- Issues event: 5
- Watch event: 2
- Issue comment event: 7
- Push event: 15
- Fork event: 1
Last Year
- Issues event: 5
- Watch event: 2
- Issue comment event: 7
- Push event: 15
- Fork event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 29
- Total pull requests: 12
- Average time to close issues: about 1 month
- Average time to close pull requests: 7 days
- Total issue authors: 19
- Total pull request authors: 4
- Average comments per issue: 5.97
- Average comments per pull request: 0.17
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- DenisGoryunov (4)
- kkmll (3)
- gnxsf (3)
- plrlhb12 (2)
- gk7279 (2)
- ghost (2)
- mbaddoo (1)
- FriederikeHanssen (1)
- hojin9218 (1)
- Jokendo-collab (1)
- mglgc (1)
- pwwang (1)
- aliakeefe (1)
- miguellarraz (1)
- dtabb73 (1)
Pull Request Authors
- pwwang (9)
- dependabot[bot] (1)
- cipherome-minkim (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 167 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 16
- Total maintainers: 1
pypi.org: vcfstats
Powerful statistics for VCF files
- Homepage: https://github.com/pwwang/vcfstats
- Documentation: https://vcfstats.readthedocs.io/
- License: MIT
-
Latest release: 0.7.0
published 8 months ago
Rankings
Maintainers (1)
Dependencies
- mkapi *
- mkdocs-material *
- pymdown-extensions *
- atomicwrites 1.4.0 develop
- attrs 21.4.0 develop
- cmdy 0.5.0 develop
- coverage 6.4.1 develop
- curio 1.5 develop
- iniconfig 1.1.1 develop
- pluggy 1.0.0 develop
- pytest 7.1.2 develop
- pytest-cov 3.0.0 develop
- asttokens 2.0.5
- click 8.1.3
- colorama 0.4.5
- coloredlogs 15.0.1
- commonmark 0.9.1
- cycler 0.11.0
- cyvcf2 0.30.15
- datar 0.8.5
- descartes 1.1.0
- diot 0.1.6
- executing 0.8.3
- fonttools 4.33.3
- humanfriendly 10.0
- inflection 0.5.1
- kiwisolver 1.4.3
- lark-parser 0.12.0
- matplotlib 3.5.2
- mizani 0.7.4
- numpy 1.23.0
- packaging 21.3
- palettable 3.3.0
- pandas 1.4.3
- patsy 0.5.2
- pillow 9.1.1
- pipda 0.6.0
- plotnine 0.8.0
- plotnine-prism 0.0.0
- pure-eval 0.2.2
- py 1.11.0
- pygments 2.12.0
- pyparam 0.5.3
- pyparsing 3.0.9
- pyreadline3 3.4.1
- python-dateutil 2.8.2
- python-simpleconf 0.5.6
- python-slugify 6.1.2
- pytz 2022.1
- rich 12.4.4
- rtoml 0.8.0
- scipy 1.6.1
- setuptools-scm 7.0.2
- six 1.16.0
- statsmodels 0.13.2
- text-unidecode 1.3
- toml 0.10.2
- tomli 2.0.1
- typing-extensions 4.2.0
- varname 0.8.3
- cmdy ^0.5 develop
- pytest ^7 develop
- pytest-cov ^3 develop
- cyvcf2 0.*
- datar ^0.8
- lark-parser ^0.12
- numpy ^1.22
- plotnine ^0.8
- plotnine-prism ^0.0
- py ^1.10
- pyparam ^0.5
- python ^3.8
- python-slugify ^6
- rich ^12
- actions/checkout v3 composite
- actions/checkout v2 composite
- actions/setup-python v4 composite
- actions/upload-artifact v3 composite
- codacy/codacy-coverage-reporter-action master composite
- docker/build-push-action v3 composite
- docker/login-action v2 composite
- docker/setup-buildx-action v2 composite
- docker/setup-qemu-action v2 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- ad-m/github-push-action master composite
- python 3.9.12-slim-buster build