seqminer
Query sequence data (VCF/BCF1/BCF2, Tabix, BGEN, PLINK) in R
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 5 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 8 committers (12.5%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Query sequence data (VCF/BCF1/BCF2, Tabix, BGEN, PLINK) in R
Basic Info
- Host: GitHub
- Owner: zhanxw
- License: other
- Language: C
- Default Branch: master
- Homepage: http://zhanxw.github.io/seqminer/
- Size: 7.99 MB
Statistics
- Stars: 30
- Watchers: 3
- Forks: 12
- Open Issues: 18
- Releases: 1
Topics
Metadata Files
README.md
SEQMINER2
Table of Contents
Introduction
Seqminer is a highly efficient R-package for retrieving sequence variants from biobank scale datasets of millions of individuals and billions of genetic variants. It supports all variant types, including multi-allelic variants and imputation dosages. It takes VCF/BCF/BGEN/PLINK format as input file, indexes, queries them based upon variant-based index and loads them as R data types such as list or matrix.
Download
Install the development version (devtools package is required):
devtools::install_github("zhanxw/seqminer")
Showcase
Here are some examples of how to use seqminer to index and query files in real-life scenarios.
Index VCF/BCF files
library(seqminer)
bcf.ref.file <- "input.bcf"
bcf.idx.file <- "input.bcf.scIdx"
out <- seqminer::createSingleChromosomeBCFIndex(bcf.ref.file, bcf.idx.file)
or
vcf.ref.file <- "input.vcf.gz"
vcf.idx.file <- "input.vcf.gz.scIdx"
out <- seqminer::createSingleChromosomeVCFIndex(vcf.ref.file, vcf.idx.file)
This would generate variant-based index that works with commonly used sequence variant file format, such as VCF/BCF files.
Query VCF/BCF files
Query VCF file:
vcf.ref.file <- "input.vcf.gz"
vcf.idx.file <- "input.vcf.gz.scIdx"
tabix.range <- "1:123-1234"
geno <- seqminer::readSingleChromosomeVCFToMatrixByRange(vcf.ref.file, tabix.range, vcf.idx.file)
Query BCF file:
bcf.ref.file <- "input.bcf"
bcf.idx.file <- "input.bcf.scIdx"
tabix.range <- "1:123-1234"
geno <- seqminer::readSingleChromosomeBCFToMatrixByRange(bcf.ref.file, tabix.range, bcf.idx.file)
Querying multiple regions is also doable, simply specify multiple regions and separte them by a comma, e.g. "1:123-124,1:1234-1235".
Output example (column represents variants, row represents individuals):

Query BGEN/PLINK files
Query BGEN file:
bg.ref.file <- "input.bgen"
bg.range <- "1:123-1234"
geno.mat <- seqminer::readBGENToMatrixByRange(bg.ref.file, bg.range)
geno.list <- seqminer::readBGENToListByRange(bg.ref.file, bg.range)
Make sure that bgen file has an index file *.bgi in the same folder.
Query PLINK file:
plink.ref.file <- "input"
geno <- seqminer::readPlinkToMatrixByIndex(plink.ref.file, sampleIndex=1:20000, markerIndex=1:100)
Command line linterface
We also developed a seqminer command line interface:
./queryVCFIndex.intel input.vcf.gz input.vcf.gz.scIdx 1:123-1234
Citation:
Owner
- Name: zhanxw
- Login: zhanxw
- Kind: user
- Website: http://zhanxw.com
- Repositories: 26
- Profile: https://github.com/zhanxw
GitHub Events
Total
Last Year
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| zhanxw | z****w@g****m | 148 |
| zhanxw | z****w | 19 |
| Lina | 1****a | 19 |
| David Trudgian | d****n@u****u | 2 |
| Bitdeli Chef | c****f@b****m | 1 |
| Renan Sauteraud | r****d@g****m | 1 |
| David Trudgian | d****e@t****t | 1 |
| timoast | 4****t | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 23
- Total pull requests: 6
- Average time to close issues: 9 days
- Average time to close pull requests: about 1 month
- Total issue authors: 15
- Total pull request authors: 6
- Average comments per issue: 1.43
- Average comments per pull request: 0.67
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- garyzhubc (4)
- bschilder (4)
- zx8754 (3)
- yulijia (1)
- xueweic (1)
- jielab (1)
- kollienne (1)
- mmoisse (1)
- isinaltinkaya (1)
- Asppagh (1)
- sariya (1)
- joshuahoffman39 (1)
- Balthasar-eu (1)
- jenzopr (1)
- WenjianBI (1)
Pull Request Authors
- SRenan (1)
- dtrudg (1)
- pekkarr (1)
- timoast (1)
- bitdeli-chef (1)
- yang-lina (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 771 last-month
- Total docker downloads: 318
- Total dependent packages: 7
- Total dependent repositories: 10
- Total versions: 42
- Total maintainers: 1
cran.r-project.org: seqminer
Efficiently Read Sequence Data (VCF Format, BCF Format, METAL Format and BGEN Format) into R
- Homepage: http://zhanxw.github.io/seqminer/
- Documentation: http://cran.r-project.org/web/packages/seqminer/seqminer.pdf
- License: GPL-2 | GPL-3 | file LICENSE [expanded from: GPL | file LICENSE]
-
Latest release: 1.9.1
published over 12 years ago
Rankings
Maintainers (1)
Dependencies
- SKAT * suggests
- testthat * suggests
- actions/checkout v2 composite
- actions/upload-artifact main composite
- r-lib/actions/check-r-package v1 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite
- r-lib/actions/setup-r-dependencies v1 composite