bigsnpr

R package for the analysis of massive SNP arrays.

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 32 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 13 committers (7.7%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.3%) to scientific vocabulary

Keywords

big-data bioinformatics memory-mapped-file parallel-computing polygenic-scores population-structure-inference r r-package snp-data statistical-methods

Last synced: 6 months ago · JSON representation

Repository

R package for the analysis of massive SNP arrays.

Basic Info

Host: GitHub
Owner: privefl
Language: R
Default Branch: master
Homepage: https://privefl.github.io/bigsnpr/
Size: 109 MB

Statistics

Stars: 210
Watchers: 8
Forks: 45
Open Issues: 18
Releases: 0

Topics

big-data bioinformatics memory-mapped-file parallel-computing polygenic-scores population-structure-inference r r-package snp-data statistical-methods

Created over 9 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog

bigsnpr

{bigsnpr} is an R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of package {bigstatsr} for the purpose of analyzing genotype data.

To get you started:

Quick demo
List of functions from bigsnpr and from bigstatsr
Extended documentation with more examples + course recording

Installation

In R, run

```r

install.packages("remotes")

remotes::install_github("privefl/bigsnpr") ```

or for the CRAN version

r install.packages("bigsnpr")

Input formats

This package reads bed/bim/fam files (PLINK preferred format) using functions snp_readBed() and snp_readBed2(). Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC() and snp_plinkKINGQC().

This package can also read UK Biobank BGEN files using function snp_readBGEN(). This function takes around 40 minutes to read 1M variants for 400K individuals using 15 cores.

This package uses a class called bigSNP for representing SNP data. A bigSNP object is a list with some elements:

$genotypes: A FBM.code256. Rows are samples and columns are variants. This stores genotype calls or dosages (rounded to 2 decimal places).
$fam: A data.frame with some information on the individuals.
$map: A data.frame with some information on the variants.

Note that most of the algorithms of this package don't handle missing values. You can use snp_fastImpute() (taking a few hours for a chip of 15K x 300K) and snp_fastImputeSimple() (taking a few minutes only) to impute missing values of genotyped variants.

Package {bigsnpr} also provides functions that directly work on bed files with a few missing values (the bed_*() functions). See paper "Efficient toolkit implementing..".

Polygenic scores

Polygenic scores are one of the main focus of this package. There are 3 main methods currently available:

Penalized regressions with individual-level data (see paper and tutorial)
Clumping and Thresholding (C+T) and Stacked C+T (SCT) with summary statistics and individual level data (see paper and tutorial).
LDpred2 with summary statistics (see paper and tutorial), and lassosum2

Possible upcoming features

Multiple imputation for GWAS (https://doi.org/10.1371/journal.pgen.1006091).
More interactive (visual) QC.

You can request some feature by opening an issue.

Bug report / Support

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr} (the big_*() functions), please open an issue on {bigstatsr}'s repo, or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

Privé, Florian, et al. "Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr." Bioinformatics 34.16 (2018): 2781-2787.
Privé, Florian, et al. "Efficient implementation of penalized regression for genetic risk prediction." Genetics 212.1 (2019): 65-74.
Privé, Florian, et al. "Making the most of Clumping and Thresholding for polygenic scores." The American Journal of Human Genetics 105.6 (2019): 1213-1221.
Privé, Florian, et al. "Efficient toolkit implementing best practices for principal component analysis of population genetic data." Bioinformatics 36.16 (2020): 4449-4457.
Privé, Florian, et al. "LDpred2: better, faster, stronger." Bioinformatics 36.22-23 (2020): 5424-5431.
Privé, Florian. "Optimal linkage disequilibrium splitting." Bioinformatics 38.1 (2022): 255–256.
Privé, Florian. "Using the UK Biobank as a global reference of worldwide populations: application to measuring ancestry diversity from GWAS summary statistics." Bioinformatics 38.13 (2022): 3477-3480.
Privé, Florian, et al. "Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores." Human Genetics and Genomics Advances 3.4 (2022).
Privé, Florian, et al. Inferring disease architecture and predictive ability with LDpred2-auto. The American Journal of Human Genetics 110.12 (2023): 2042-2055.

Owner

Name: Florian Privé
Login: privefl
Kind: user
Location: Aarhus, Denmark // Lyon, France
Company: National Center for Register-based Research (NCRR)

Website: https://privefl.github.io/
Twitter: privefl
Repositories: 104
Profile: https://github.com/privefl

Senior Researcher (2022-) • Postdoc (2019-2021) • PhD student (2016-2019) in predictive human genetics • ENSIMAG (2013-2016)

GitHub Events

Total

Issues event: 71
Watch event: 22
Issue comment event: 144
Push event: 8
Fork event: 1

Last Year

Issues event: 71
Watch event: 22
Issue comment event: 144
Push event: 8
Fork event: 1

Committers

Last synced: about 2 years ago

All Time

Total Commits: 1,047
Total Committers: 13
Avg Commits per committer: 80.538
Development Distribution Score (DDS): 0.064

Past Year

Commits: 26
Committers: 1
Avg Commits per committer: 26.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Florian Privé	f**1@g**m	980
Florian Privé	f**e@i**g	47
privef	p**f@t**r	9
Florian Prive	p**f@t**r	2
Alice MacQueen	a**n@g**m	1
Florian Franck Privé	a**3@u**k	1
mblumuga	m**m@u**r	1
monsanto-pinheiro	m**o@g**m	1
privef	p**f@k**r	1
Alice MacQueen	3****n	1
Antoine Bichat	3****t	1
Jim Hester	j**r@g**m	1
timo-cpr	6****r	1

Committer Domains (Top 20 + Academic)

krakenator.imag.fr: 1 univ-grenoble-alpes.fr: 1 uni.au.dk: 1 timc-bcm-30.imag.fr: 1 timc-bcm-13.imag.fr: 1 inp-grenoble.org: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 221
Total pull requests: 4
Average time to close issues: about 2 months
Average time to close pull requests: 10 months
Total issue authors: 134
Total pull request authors: 4
Average comments per issue: 5.83
Average comments per pull request: 7.5
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 30
Pull requests: 0
Average time to close issues: 2 months
Average time to close pull requests: N/A
Issue authors: 19
Pull request authors: 0
Average comments per issue: 3.53
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

privefl (17)
garyzhubc (17)
Mahantesh-Biradar (6)
koujiaodahan (5)
alek0991 (5)
scienception (4)
oalavijeh (3)
JasperHof (3)
JNajar (3)
AlisaBIG (3)
jianvhuang (3)
alhannae (3)
JuanJoMV (3)
aepacker (3)
Sabor117 (3)

Pull Request Authors

Hugolyu (2)
dramanica (1)
jean997 (1)
timo-cpr (1)
privefl (1)

Top Labels

Issue Labels

enhancement (7) good for first PR (6) feature request (5) bug (2) help wanted (1) question (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 1,895 last-month
Total docker downloads: 1,597

Total dependent packages: 3
Total dependent repositories: 4
Total versions: 18
Total maintainers: 1

cran.r-project.org: bigsnpr

Analysis of Massive SNP Arrays

Homepage: https://privefl.github.io/bigsnpr/
Documentation: http://cran.r-project.org/web/packages/bigsnpr/bigsnpr.pdf
License: GPL-3
Latest release: 1.12.21
published 6 months ago

Versions: 18
Dependent Packages: 3
Dependent Repositories: 4
Downloads: 1,895 Last month
Docker Downloads: 1,597

Rankings

Forks count: 1.8%

Stargazers count: 2.5%

Downloads: 9.1%

Average: 9.2%

Dependent packages count: 10.9%

Dependent repos count: 14.5%

Docker downloads count: 16.5%

Maintainers (1)

florian.prive.21@gmail.com

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.3 depends
bigstatsr >= 1.5.6 depends
Matrix * imports
Rcpp * imports
bigassertr >= 0.1.3 imports
bigparallelr * imports
bigreadr * imports
bigsparser >= 0.6 imports
bigutilsr >= 0.3.3 imports
data.table >= 1.12.4 imports
doRNG * imports
foreach * imports
ggplot2 * imports
magrittr * imports
methods * imports
stats * imports
vctrs * imports
Hmisc * suggests
R.utils * suggests
RSQLite * suggests
RSpectra * suggests
RhpcBLASctl * suggests
bindata * suggests
covr * suggests
dbplyr >= 1.4 suggests
dplyr * suggests
gaston * suggests
glue * suggests
pcadapt >= 4.1 suggests
quadprog * suggests
rmutil * suggests
runonce * suggests
spelling * suggests
testthat * suggests
tibble * suggests
xgboost * suggests

.github/workflows/check-standard.yaml actions

actions/checkout v3 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

bigsnpr

Science Score: 59.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

bigsnpr

Installation

install.packages("remotes")

Input formats

Polygenic scores

Possible upcoming features

Bug report / Support

References

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: bigsnpr

Rankings

Maintainers (1)

Dependencies