seqminer

Query sequence data (VCF/BCF1/BCF2, Tabix, BGEN, PLINK) in R

https://github.com/zhanxw/seqminer

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 8 committers (12.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary

Keywords

annotation bcf bgen meta-analysis next-generation-sequencing plink sequencing tabix vcf workflow

Keywords from Contributors

bioinformatics genomics usegalaxy
Last synced: 6 months ago · JSON representation

Repository

Query sequence data (VCF/BCF1/BCF2, Tabix, BGEN, PLINK) in R

Basic Info
Statistics
  • Stars: 30
  • Watchers: 3
  • Forks: 12
  • Open Issues: 18
  • Releases: 1
Topics
annotation bcf bgen meta-analysis next-generation-sequencing plink sequencing tabix vcf workflow
Created over 12 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License

README.md

SEQMINER2

R-CMD-check AppVeyor build status CRAN_Status_Badge

Table of Contents

Introduction

Seqminer is a highly efficient R-package for retrieving sequence variants from biobank scale datasets of millions of individuals and billions of genetic variants. It supports all variant types, including multi-allelic variants and imputation dosages. It takes VCF/BCF/BGEN/PLINK format as input file, indexes, queries them based upon variant-based index and loads them as R data types such as list or matrix.

Download

Install the development version (devtools package is required):

devtools::install_github("zhanxw/seqminer")

Showcase

Here are some examples of how to use seqminer to index and query files in real-life scenarios.

Index VCF/BCF files

library(seqminer)
bcf.ref.file <- "input.bcf"
bcf.idx.file <- "input.bcf.scIdx"
out <- seqminer::createSingleChromosomeBCFIndex(bcf.ref.file, bcf.idx.file)

or

vcf.ref.file <- "input.vcf.gz"
vcf.idx.file <- "input.vcf.gz.scIdx"
out <- seqminer::createSingleChromosomeVCFIndex(vcf.ref.file, vcf.idx.file)

This would generate variant-based index that works with commonly used sequence variant file format, such as VCF/BCF files.

Query VCF/BCF files

Query VCF file:

vcf.ref.file <-  "input.vcf.gz"
vcf.idx.file <-  "input.vcf.gz.scIdx"
tabix.range <- "1:123-1234"
geno <- seqminer::readSingleChromosomeVCFToMatrixByRange(vcf.ref.file, tabix.range, vcf.idx.file)

Query BCF file:

bcf.ref.file <- "input.bcf"
bcf.idx.file <- "input.bcf.scIdx"
tabix.range <- "1:123-1234"
geno <- seqminer::readSingleChromosomeBCFToMatrixByRange(bcf.ref.file, tabix.range, bcf.idx.file)

Querying multiple regions is also doable, simply specify multiple regions and separte them by a comma, e.g. "1:123-124,1:1234-1235".

Output example (column represents variants, row represents individuals):

Query BGEN/PLINK files

Query BGEN file:

bg.ref.file <- "input.bgen"
bg.range <- "1:123-1234"
geno.mat <- seqminer::readBGENToMatrixByRange(bg.ref.file, bg.range)
geno.list <- seqminer::readBGENToListByRange(bg.ref.file, bg.range)

Make sure that bgen file has an index file *.bgi in the same folder.

Query PLINK file:

plink.ref.file <- "input"
geno <- seqminer::readPlinkToMatrixByIndex(plink.ref.file, sampleIndex=1:20000, markerIndex=1:100)

Command line linterface

We also developed a seqminer command line interface:

./queryVCFIndex.intel input.vcf.gz input.vcf.gz.scIdx 1:123-1234

Citation:

Yang, L., Jiang, S., Jiang, B., Liu, D. J., & Zhan, X. (2020). Seqminer2: An Efficient Tool to Query and Retrieve Genotypes for Statistical Genetics Analyses from Biobank Scale Sequence Dataset. Bioinformatics

Zhan, X. and Liu, D. J. (2015), SEQMINER: An R-Package to Facilitate the Functional Interpretation of Sequence-Based Associations. Genet. Epidemiol., 39: 619–623. doi:10.1002/gepi.21918

Owner

  • Name: zhanxw
  • Login: zhanxw
  • Kind: user

GitHub Events

Total
Last Year

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 192
  • Total Committers: 8
  • Avg Commits per committer: 24.0
  • Development Distribution Score (DDS): 0.229
Past Year
  • Commits: 16
  • Committers: 2
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.375
Top Committers
Name Email Commits
zhanxw z****w@g****m 148
zhanxw z****w 19
Lina 1****a 19
David Trudgian d****n@u****u 2
Bitdeli Chef c****f@b****m 1
Renan Sauteraud r****d@g****m 1
David Trudgian d****e@t****t 1
timoast 4****t 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 23
  • Total pull requests: 6
  • Average time to close issues: 9 days
  • Average time to close pull requests: about 1 month
  • Total issue authors: 15
  • Total pull request authors: 6
  • Average comments per issue: 1.43
  • Average comments per pull request: 0.67
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • garyzhubc (4)
  • bschilder (4)
  • zx8754 (3)
  • yulijia (1)
  • xueweic (1)
  • jielab (1)
  • kollienne (1)
  • mmoisse (1)
  • isinaltinkaya (1)
  • Asppagh (1)
  • sariya (1)
  • joshuahoffman39 (1)
  • Balthasar-eu (1)
  • jenzopr (1)
  • WenjianBI (1)
Pull Request Authors
  • SRenan (1)
  • dtrudg (1)
  • pekkarr (1)
  • timoast (1)
  • bitdeli-chef (1)
  • yang-lina (1)
Top Labels
Issue Labels
enhancement (1) bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 771 last-month
  • Total docker downloads: 318
  • Total dependent packages: 7
  • Total dependent repositories: 10
  • Total versions: 42
  • Total maintainers: 1
cran.r-project.org: seqminer

Efficiently Read Sequence Data (VCF Format, BCF Format, METAL Format and BGEN Format) into R

  • Versions: 42
  • Dependent Packages: 7
  • Dependent Repositories: 10
  • Downloads: 771 Last month
  • Docker Downloads: 318
Rankings
Forks count: 5.8%
Dependent packages count: 6.6%
Dependent repos count: 9.2%
Stargazers count: 9.7%
Average: 11.7%
Downloads: 13.1%
Docker downloads count: 25.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • SKAT * suggests
  • testthat * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/check-r-package v1 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
  • r-lib/actions/setup-r-dependencies v1 composite