SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)

https://github.com/zhengxwen/snprelate

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.5%) to scientific vocabulary

Keywords

bioinformatics gds-format pca r simd snp
Last synced: 6 months ago · JSON representation

Repository

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)

Basic Info
Statistics
  • Stars: 108
  • Watchers: 12
  • Forks: 25
  • Open Issues: 43
  • Releases: 7
Topics
bioinformatics gds-format pca r simd snp
Created over 11 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog

README.md

SNPRelate: Parallel computing toolset for relatedness and principal component analysis of SNP data

GPLv3 GNU General Public License, GPLv3

Availability Years-in-BioC R

Features

Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed SNPRelate (R package for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized.

The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. The SNP GDS format in this package is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variation (SNV), insertion/deletion polymorphism (indel) and structural variation calls. It is strongly suggested to use SeqArray for large-scale whole-exome and whole-genome sequencing variant data instead of SNPRelate.

Bioconductor

Release Version: v1.42.1

http://www.bioconductor.org/packages/SNPRelate

News

Tutorials

http://www.bioconductor.org/packages/release/bioc/vignettes/SNPRelate/inst/doc/SNPRelate.html

Citations

Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.

Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.

Installation

  • Bioconductor repository: R if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("SNPRelate")

  • Development version from Github (for developers/testers only): R library("devtools") install_github("zhengxwen/gdsfmt") install_github("zhengxwen/SNPRelate") The install_github() approach requires that you build from source, i.e. make and compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.

Implementation with Intel Intrinsics

| Functions | No SIMD | SSE2 | AVX | AVX2 | AVX-512 | |:----------------------|:-------:|:----:|:---:|:----:|:-------:| | snpgdsDiss » | X | | snpgdsEIGMIX » | X | X | X | | snpgdsGRM » | X | X | X | . | | snpgdsIBDKING » | X | X | | X | | snpgdsIBDMoM » | X | | snpgdsIBS » | X | X | | snpgdsIBSNum » | X | X | | snpgdsIndivBeta » | X | X | P | X | | snpgdsPCA » | X | X | X | | snpgdsPCACorr » | X | | snpgdsPCASampLoading » | X | | snpgdsPCASNPLoading » | X | | ... |

X: fully supported; .: partially supported; P: POPCNT instruction.

Install the package from the source code with the support of Intel SIMD Intrinsics:

You have to customize the package compilation, see: CRAN: Customizing-package-compilation

Change ~/.R/Makevars to, assuming GNU Compilers (gcc/g++) or Clang compiler (clang++) are installed: ```sh

for C code

CFLAGS=-g -O3 -march=native -mtune=native

for C++ code

CXXFLAGS=-g -O3 -march=native -mtune=native ```

Owner

  • Name: Xiuwen Zheng
  • Login: zhengxwen
  • Kind: user
  • Location: Chicago

GitHub Events

Total
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 2
  • Push event: 5
Last Year
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 2
  • Push event: 5

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 375
  • Total Committers: 9
  • Avg Commits per committer: 41.667
  • Development Distribution Score (DDS): 0.035
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Xiuwen Zheng z****n@g****m 362
Xiuwen Zheng x****g@a****m 3
Bioconductor Git-SVN Bridge b****c@b****g 3
Kevin Murray k****1 2
Stephanie M. Gogarten s****n@g****m 1
NikNakk n****k@n****m 1
Dr. K. D. Murray 1****9 1
Dan Bolser 5****k 1
Billsfriend 4****d 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 184
  • Total pull requests: 12
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 20 days
  • Total issue authors: 78
  • Total pull request authors: 4
  • Average comments per issue: 2.11
  • Average comments per pull request: 0.67
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 4 months
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • zhengxwen (4)
  • thierrygosselin (4)
  • nottwy (3)
  • kforner (3)
  • AAvalos82 (3)
  • Tman3 (2)
  • elizeng (2)
  • yangli-ai (2)
  • smgogarten (2)
  • jgx65 (2)
  • evigorito (2)
  • kroluk (2)
  • rafalcode (1)
  • jane-edgeloe (1)
  • linsson (1)
Pull Request Authors
  • kdm9 (2)
  • CholoTook (2)
  • Billsfriend (1)
  • smgogarten (1)
  • NikNakk (1)
Top Labels
Issue Labels
bug (5) feature required (4) enhancement (4) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 218,346 total
  • Total dependent packages: 11
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
bioconductor.org: SNPRelate

Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

  • Versions: 7
  • Dependent Packages: 11
  • Dependent Repositories: 0
  • Downloads: 218,346 Total
Rankings
Dependent repos count: 0.0%
Forks count: 1.8%
Stargazers count: 2.1%
Average: 3.2%
Dependent packages count: 5.0%
Downloads: 7.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 2.15 depends
  • gdsfmt >= 1.8.3 depends
  • SeqArray >= 1.12.0 enhances
  • methods * imports
  • BiocGenerics * suggests
  • MASS * suggests
  • Matrix * suggests
  • RUnit * suggests
  • knitr * suggests
  • markdown * suggests
  • parallel * suggests
  • rmarkdown * suggests
.github/workflows/r.yml actions
  • actions/checkout v3 composite
  • r-lib/actions/setup-r f57f1301a053485946083d7a45022b278929a78a composite