bigsnpr

R package for the analysis of massive SNP arrays.

https://github.com/privefl/bigsnpr

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 32 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 13 committers (7.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.3%) to scientific vocabulary

Keywords

big-data bioinformatics memory-mapped-file parallel-computing polygenic-scores population-structure-inference r r-package snp-data statistical-methods
Last synced: 6 months ago · JSON representation

Repository

R package for the analysis of massive SNP arrays.

Basic Info
Statistics
  • Stars: 210
  • Watchers: 8
  • Forks: 45
  • Open Issues: 18
  • Releases: 0
Topics
big-data bioinformatics memory-mapped-file parallel-computing polygenic-scores population-structure-inference r r-package snp-data statistical-methods
Created over 9 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog

README.md

R build status Codecov test coverage CRAN status DOI <!-- badges: end -->

bigsnpr

{bigsnpr} is an R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of package {bigstatsr} for the purpose of analyzing genotype data.

To get you started:

Installation

In R, run

```r

install.packages("remotes")

remotes::install_github("privefl/bigsnpr") ```

or for the CRAN version

r install.packages("bigsnpr")

Input formats

This package reads bed/bim/fam files (PLINK preferred format) using functions snp_readBed() and snp_readBed2(). Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC() and snp_plinkKINGQC().

This package can also read UK Biobank BGEN files using function snp_readBGEN(). This function takes around 40 minutes to read 1M variants for 400K individuals using 15 cores.

This package uses a class called bigSNP for representing SNP data. A bigSNP object is a list with some elements:

  • $genotypes: A FBM.code256. Rows are samples and columns are variants. This stores genotype calls or dosages (rounded to 2 decimal places).
  • $fam: A data.frame with some information on the individuals.
  • $map: A data.frame with some information on the variants.

Note that most of the algorithms of this package don't handle missing values. You can use snp_fastImpute() (taking a few hours for a chip of 15K x 300K) and snp_fastImputeSimple() (taking a few minutes only) to impute missing values of genotyped variants.

Package {bigsnpr} also provides functions that directly work on bed files with a few missing values (the bed_*() functions). See paper "Efficient toolkit implementing..".

Polygenic scores

Polygenic scores are one of the main focus of this package. There are 3 main methods currently available:

  • Penalized regressions with individual-level data (see paper and tutorial)

  • Clumping and Thresholding (C+T) and Stacked C+T (SCT) with summary statistics and individual level data (see paper and tutorial).

  • LDpred2 with summary statistics (see paper and tutorial), and lassosum2

Possible upcoming features

  • Multiple imputation for GWAS (https://doi.org/10.1371/journal.pgen.1006091).

  • More interactive (visual) QC.

You can request some feature by opening an issue.

Bug report / Support

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr} (the big_*() functions), please open an issue on {bigstatsr}'s repo, or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

Owner

  • Name: Florian Privé
  • Login: privefl
  • Kind: user
  • Location: Aarhus, Denmark // Lyon, France
  • Company: National Center for Register-based Research (NCRR)

Senior Researcher (2022-) • Postdoc (2019-2021) • PhD student (2016-2019) in predictive human genetics • ENSIMAG (2013-2016)

GitHub Events

Total
  • Issues event: 71
  • Watch event: 22
  • Issue comment event: 144
  • Push event: 8
  • Fork event: 1
Last Year
  • Issues event: 71
  • Watch event: 22
  • Issue comment event: 144
  • Push event: 8
  • Fork event: 1

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 1,047
  • Total Committers: 13
  • Avg Commits per committer: 80.538
  • Development Distribution Score (DDS): 0.064
Past Year
  • Commits: 26
  • Committers: 1
  • Avg Commits per committer: 26.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Florian Privé f****1@g****m 980
Florian Privé f****e@i****g 47
privef p****f@t****r 9
Florian Prive p****f@t****r 2
Alice MacQueen a****n@g****m 1
Florian Franck Privé a****3@u****k 1
mblumuga m****m@u****r 1
monsanto-pinheiro m****o@g****m 1
privef p****f@k****r 1
Alice MacQueen 3****n 1
Antoine Bichat 3****t 1
Jim Hester j****r@g****m 1
timo-cpr 6****r 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 221
  • Total pull requests: 4
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 10 months
  • Total issue authors: 134
  • Total pull request authors: 4
  • Average comments per issue: 5.83
  • Average comments per pull request: 7.5
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 30
  • Pull requests: 0
  • Average time to close issues: 2 months
  • Average time to close pull requests: N/A
  • Issue authors: 19
  • Pull request authors: 0
  • Average comments per issue: 3.53
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • privefl (17)
  • garyzhubc (17)
  • Mahantesh-Biradar (6)
  • koujiaodahan (5)
  • alek0991 (5)
  • scienception (4)
  • oalavijeh (3)
  • JasperHof (3)
  • JNajar (3)
  • AlisaBIG (3)
  • jianvhuang (3)
  • alhannae (3)
  • JuanJoMV (3)
  • aepacker (3)
  • Sabor117 (3)
Pull Request Authors
  • Hugolyu (2)
  • dramanica (1)
  • jean997 (1)
  • timo-cpr (1)
  • privefl (1)
Top Labels
Issue Labels
enhancement (7) good for first PR (6) feature request (5) bug (2) help wanted (1) question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 1,895 last-month
  • Total docker downloads: 1,597
  • Total dependent packages: 3
  • Total dependent repositories: 4
  • Total versions: 18
  • Total maintainers: 1
cran.r-project.org: bigsnpr

Analysis of Massive SNP Arrays

  • Versions: 18
  • Dependent Packages: 3
  • Dependent Repositories: 4
  • Downloads: 1,895 Last month
  • Docker Downloads: 1,597
Rankings
Forks count: 1.8%
Stargazers count: 2.5%
Downloads: 9.1%
Average: 9.2%
Dependent packages count: 10.9%
Dependent repos count: 14.5%
Docker downloads count: 16.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.3 depends
  • bigstatsr >= 1.5.6 depends
  • Matrix * imports
  • Rcpp * imports
  • bigassertr >= 0.1.3 imports
  • bigparallelr * imports
  • bigreadr * imports
  • bigsparser >= 0.6 imports
  • bigutilsr >= 0.3.3 imports
  • data.table >= 1.12.4 imports
  • doRNG * imports
  • foreach * imports
  • ggplot2 * imports
  • magrittr * imports
  • methods * imports
  • stats * imports
  • vctrs * imports
  • Hmisc * suggests
  • R.utils * suggests
  • RSQLite * suggests
  • RSpectra * suggests
  • RhpcBLASctl * suggests
  • bindata * suggests
  • covr * suggests
  • dbplyr >= 1.4 suggests
  • dplyr * suggests
  • gaston * suggests
  • glue * suggests
  • pcadapt >= 4.1 suggests
  • quadprog * suggests
  • rmutil * suggests
  • runonce * suggests
  • spelling * suggests
  • testthat * suggests
  • tibble * suggests
  • xgboost * suggests
.github/workflows/check-standard.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite