Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
11 of 43 committers (25.6%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
cython + htslib == fast VCF and BCF processing
Basic Info
Statistics
- Stars: 411
- Watchers: 6
- Forks: 75
- Open Issues: 47
- Releases: 0
Topics
Metadata Files
README.md
cyvcf2
Note: cyvcf2 versions < 0.20.0 require htslib < 1.10. cyvcf2 versions >= 0.20.0 require htslib >= 1.10
The latest documentation for cyvcf2 can be found here:
If you use cyvcf2, please cite the paper
Fast python (2 and 3) parsing of VCF and BCF including region-queries.
cyvcf2 is a cython wrapper around htslib built for fast parsing of Variant Call Format (VCF) files.
Attributes like variant.gt_ref_depths work for diploid samples and return a numpy array directly so they are immediately ready for downstream use.
note that the array is backed by the underlying C data, so, once variant goes out of scope. The array will contain nonsense.
To persist a copy, use: cpy = np.array(variant.gt_ref_depths) instead of just arr = variant.gt_ref_depths.
Example
The example below shows much of the use of cyvcf2.
```Python from cyvcf2 import VCF
for variant in VCF('some.vcf.gz'): # or VCF('some.bcf') variant.REF, variant.ALT # e.g. REF='A', ALT=['C', 'T']
variant.CHROM, variant.start, variant.end, variant.ID, \
variant.FILTER, variant.QUAL
# numpy arrays of specific things we pull from the sample fields.
# gt_types is array of 0,1,2,3==HOM_REF, HET, UNKNOWN, HOM_ALT
variant.gt_types, variant.gt_ref_depths, variant.gt_alt_depths # numpy arrays
variant.gt_phases, variant.gt_quals, variant.gt_bases # numpy array
## INFO Field.
## extract from the info field by it's name:
variant.INFO.get('DP') # int
variant.INFO.get('FS') # float
variant.INFO.get('AC') # float
# convert back to a string.
str(variant)
## sample info...
# Get a numpy array of the depth per sample:
dp = variant.format('DP')
# or of any other format field:
sb = variant.format('SB')
assert sb.shape == (n_samples, 4) # 4-values per
to do a region-query:
vcf = VCF('some.vcf.gz') for v in vcf('11:435345-556565'): if v.INFO["AF"] > 0.1: continue print(str(v)) ```
Installation
pip with bundled htslib
pip install cyvcf2
pip with system htslib
Assuming you have already built and installed htslib version 1.12 or higher.
CYVCF2_HTSLIB_MODE=EXTERNAL pip install --no-binary cyvcf2 cyvcf2
windows (experimental, only test on MSYS2)
Assuming you have already built and installed htslib.
SETUPTOOLS_USE_DISTUTILS=stdlib pip install cyvcf2
github (building htslib and cyvcf2 from source)
``` git clone --recursive https://github.com/brentp/cyvcf2 pip install -r requirements.txt
sometimes it can be required to remove old files:
python setup.py clean_ext
CYVCF2HTSLIBMODE=BUILTIN CYTHONIZE=1 python setup.py install
or to use a system htslib.so
CYVCF2HTSLIBMODE=EXTERNAL python setup.py install ```
On OSX, using brew, you may have to set the following as indicated by the brew install:
``` For compilers to find openssl you may need to set: export LDFLAGS="-L/usr/local/opt/openssl/lib" export CPPFLAGS="-I/usr/local/opt/openssl/include"
For pkg-config to find openssl you may need to set: export PKGCONFIGPATH="/usr/local/opt/openssl/lib/pkgconfig" ```
Testing
Install pytest, then tests can be run with:
pytest
CLI
Run with cyvcf2 path_to_vcf
```
$ cyvcf2 --help
Usage: cyvcf2 [OPTIONS]
fast vcf parsing with cython + htslib
Options: -c, --chrom TEXT Specify what chromosome to include. -s, --start INTEGER Specify the start of region. -e, --end INTEGER Specify the end of the region. --include TEXT Specify what info field to include. --exclude TEXT Specify what info field to exclude. --loglevel [DEBUG|INFO|WARNING|ERROR|CRITICAL] Set the level of log output. [default: INFO] --silent Skip printing of vcf. --help Show this message and exit. ```
See Also
Pysam also has a cython wrapper to htslib and one block of code here is taken directly from that library. But, the optimizations that we want for gemini are very specific so we have chosen to create a separate project.
Performance
For the performance comparison in the paper, we used thousand genomes chromosome 22 With the full comparison runner here.
Owner
- Name: Brent Pedersen
- Login: brentp
- Kind: user
- Location: Oregon, USA
- Twitter: brent_p
- Repositories: 220
- Profile: https://github.com/brentp
Doing genomics
GitHub Events
Total
- Issues event: 5
- Watch event: 33
- Issue comment event: 10
- Fork event: 3
Last Year
- Issues event: 5
- Watch event: 33
- Issue comment event: 10
- Fork event: 3
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Brent Pedersen | b****e@g****m | 344 |
| Tom White | t****e@g****m | 21 |
| graphenn | g****n@g****m | 15 |
| indraniel | i****l@g****m | 12 |
| Arya Massarat | 2****m | 7 |
| Jerome Kelleher | jk@w****k | 7 |
| Liang-Bo Wang | l****g@w****u | 5 |
| Måns Magnusson | m****n@s****e | 5 |
| Dave Lawrence | d****w@g****m | 4 |
| Marcel Martin | m****n@s****e | 4 |
| Sam Lichtenberg | s****e@g****m | 4 |
| Nils Homer | n****3 | 4 |
| Graham Gower | g****r@g****m | 3 |
| Michael Hall | m****l@m****h | 3 |
| Tim Millar | t****r | 2 |
| Elias Kuthe | e****e@t****e | 2 |
| LiterallyUniqueLogin | j****e@g****m | 2 |
| Sander Bollen | a****n@l****l | 2 |
| Wouter De Coster | d****r@g****m | 2 |
| arq5x | a****x@v****u | 2 |
| cclauss | c****s@b****h | 2 |
| chapmanb | c****b@5****m | 2 |
| Ben Jeffery | b****y@b****k | 1 |
| Ben Jeffery | b****y@g****m | 1 |
| Danilo Horta | d****a@p****e | 1 |
| Derek Croote | d****e | 1 |
| Dave Larson | d****n@g****u | 1 |
| Gehring, Julian | j****g@i****m | 1 |
| Gilad Mishne | g****d@m****g | 1 |
| James Eapen | j****n@g****m | 1 |
| and 13 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 105
- Total pull requests: 52
- Average time to close issues: 6 months
- Average time to close pull requests: about 1 month
- Total issue authors: 71
- Total pull request authors: 22
- Average comments per issue: 3.96
- Average comments per pull request: 2.56
- Merged pull requests: 45
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 1
- Average time to close issues: 15 days
- Average time to close pull requests: 3 days
- Issue authors: 5
- Pull request authors: 1
- Average comments per issue: 2.4
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tomwhite (15)
- davmlaw (5)
- grahamgower (3)
- vsbuffalo (3)
- hammer (3)
- jeromekelleher (2)
- quattro (2)
- nh13 (2)
- CholoTook (2)
- alanwilter (2)
- brentp (2)
- gnxsf (2)
- graphenn (2)
- shubhamsaini (1)
- adrienlemeur (1)
Pull Request Authors
- tomwhite (20)
- graphenn (9)
- grahamgower (3)
- benjeffery (2)
- esrice (2)
- nakib103 (2)
- EQt (2)
- jeromekelleher (2)
- Hoeze (2)
- horta (1)
- kullrich (1)
- CholoTook (1)
- davmlaw (1)
- stefanor (1)
- brentp (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 82,567 last-month
- Total docker downloads: 1,802
-
Total dependent packages: 45
(may contain duplicates) -
Total dependent repositories: 99
(may contain duplicates) - Total versions: 197
- Total maintainers: 2
pypi.org: cyvcf2
fast vcf parsing with cython + htslib
- Homepage: https://github.com/brentp/cyvcf2/
- Documentation: https://cyvcf2.readthedocs.io/
- License: MIT
-
Latest release: 0.31.1
published over 1 year ago
Rankings
Maintainers (1)
proxy.golang.org: github.com/brentp/cyvcf2
- Documentation: https://pkg.go.dev/github.com/brentp/cyvcf2#section-documentation
- License: mit
-
Latest release: v0.31.1
published over 1 year ago
Rankings
spack.io: py-cyvcf2
fast vcf parsing with cython + htslib
- Homepage: https://github.com/brentp/cyvcf2
- License: []
-
Latest release: 0.11.7
published almost 4 years ago
Rankings
Maintainers (1)
Dependencies
- click *
- coloredlogs *
- cython >=0.23.3
- numpy *
- click *
- coloredlogs *
- numpy *
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/download-artifact v2 composite
- actions/setup-python v2 composite
- actions/upload-artifact v2 composite
- mxschmitt/action-tmate v3 composite
- pypa/gh-action-pypi-publish release/v1 composite