GWASinlps

R package for Non-local Prior Based Iterative Variable Selection for Genome-Wide Association Studies, or Other High-Dimensional Data

https://github.com/nilotpalsanyal/gwasinlps

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.4%) to scientific vocabulary

Keywords

gwas high-dimensional non-local-prior variable-selection
Last synced: 6 months ago · JSON representation

Repository

R package for Non-local Prior Based Iterative Variable Selection for Genome-Wide Association Studies, or Other High-Dimensional Data

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
gwas high-dimensional non-local-prior variable-selection
Created over 3 years ago · Last pushed about 3 years ago
Metadata Files
Readme

README.md

GWASinlps: Non-local Prior Based Iterative Variable Selection Tool for Genome-Wide Association Studies

CRAN
status CodeFactor R-CMD-check <!-- badges: end -->

GWASinlps performs Bayesian non-local prior based iterative variable selection for data from genome-Wide association studies (GWAS), or other high-dimensional data with continuous, binary or survival outcomes (see References below).

Installation

Install from CRAN

r install.packages("GWASinlps")

Install from GitHub

``` r

install.packages("devtools")

devtools::install_github("nilotpalsanyal/GWASinlps") ```

The main function:

GWASinlps() is the main function which accepts continuous or binary data (such as phenotype data) and a matrix with the independent variable values (SNP genotypes). The function also needs as input values for scaling parameter of the selected non-local prior and the tuning paramters. These should be fixed based on exploratory study and/or subject-specific heuristics. For example, in GWAS analysis, as the GWAS effect sizes are generally very small (typical effect size of a SNP is around 0.05% of the total phenotypic variance for quantitative traits), the scaling parameter can be chosen such that the non-local prior allows at least 1% chance of a standardized effect size being 0.05 or less in absolute value. Such estimates of the scaling parameter for the MOM and iMOM priors are 0.022 and 0.008, respectively.

Here is a simple illistration of the use the GWASinlps() function for both continous and binary phenotypes.

GWASinlps analysis with continuous data/phenotypes

``` r library(GWASinlps)

> Loading required package: mombf

> Loading required package: mvtnorm

> Loading required package: ncvreg

> Loading required package: mgcv

> Loading required package: nlme

> This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.

>

> Welcome to GWASinlps! Select well.

>

> Website: https://nilotpalsanyal.github.io/GWASinlps/

> Bug report: https://github.com/nilotpalsanyal/GWASinlps/issues

Generate design matrix (genotype matrix)

n = 200 #number of subjects p = 10000 #number of variables/SNPs m = 10 #number of true variables/causal SNPs set.seed(1) f = runif( p, .1, .2 ) #simulate minor allele frequency x = matrix(nrow = n, ncol = p) for(j in 1:p) x[,j] = rbinom(n, 2, f[j]) #simulate genotypes colnames(x) = 1:p

Generate true effect sizes

causalsnps = sample(1:p, m) beta = rep(0, p) beta[causalsnps] = rnorm(m, mean = 0, sd = 2 )

Generate continuous (phenotype) data

y = x %*% beta + rnorm(n, 0, 1)

GWASinlps analysis

inlps <- GWASinlps(y=y, x=x, family="normal", prior="mom", tau=0.2, k0=1, m=50, rxx=0.2)

> =================================

> Number of selected variables: 9

> Time taken: 0.04 min

> =================================

LASSO analysis

library(glmnet)

> Loading required package: Matrix

> Loaded glmnet 4.1-4

fit.cvlasso = cv.glmnet( x, y, alpha = 1 ) l.min = fit.cvlasso $lambda.min # lambda that gives minimum cvm l.1se = fit.cvlasso $lambda.1se # largest lambda such that error is # within 1 se of the minimum

lassomin = which( as.vector( coef( fit.cvlasso, s = l.min ) )[-1] != 0 ) lasso1se = which( as.vector( coef( fit.cvlasso, s = l.1se ) )[-1] != 0 )

Compare results

library(kableExtra)

res = matrix(nrow=3,ncol=3) res[1,] = c(length(inlps$selected), length(intersect(inlps$selected, causalsnps)), length(setdiff(causalsnps, inlps$selected)) ) res[2,] = c(length(lassomin), length(intersect(lassomin, causalsnps)), length(setdiff(causalsnps, lassomin))) res[3,] = c(length(lasso1se), length(intersect(lasso1se, causalsnps)), length(setdiff(causalsnps, lasso1se))) colnames(res) = c("#Selected SNPs","#True positive","#False negative") rownames(res) = c("GWASinlps", "LASSO min", "LASSO 1se")

kableExtra::kable(res, format="html", table.attr= "style='width:60%;'", caption=paste("

Variable selection from", p, " SNPs with", m, " causal SNPs for continuous phenotypes from", n, " subjects
"), escape=FALSE) %>% kableExtra::kable_styling() ```

Variable selection from 10000 SNPs with 10 causal SNPs for continuous phenotypes from 200 subjects
\#Selected SNPs \#True positive \#False negative
GWASinlps 9 8 2
LASSO min 190 8 2
LASSO 1se 44 8 2

GWASinlps analysis with binary data/phenotypes

``` r library(GWASinlps) library(fastglm)

> Loading required package: bigmemory

Generate design matrix (genotype matrix)

n = 500 #number of subjects p = 2000 #number of variables/SNPs m = 10 #number of true variables/SNPs set.seed(1) f = runif( p, .1, .2 ) #simulate minor allele frequency x = matrix(nrow = n, ncol = p) for(j in 1:p) x[,j] = rbinom(n, 2, f[j]) #simulate genotypes colnames(x) = 1:p

Generate true effect sizes

causalsnps = sample(1:p, m) beta = rep(0, p) beta[causalsnps] = rnorm(m, mean = 0, sd = 2 )

Generate binary (phenotype) data

prob = exp(x %% beta)/(1 + exp(x %% beta)) y = sapply(1:n, function(i)rbinom(1,1,prob[i]) )

GWASinlps analysis

mode(x) = "double" #needed for fastglm() function below mmle_xy = apply( x, 2, function(z) coef( fastglm(y=y, x=cbind(1,matrix(z)), family = binomial(link = "logit")) )[2] )

pre-compute MMLEs of betas as it takes time

inlpsrigorous <- GWASinlps(y=y, x=x, family="binomial", method="rigorous", mmlexy=mmle_xy, prior="mom", tau=0.2, k0=1, m=50, rxx=0.2)

> =================================

> Number of selected variables: 4

> Time taken: 0.33 min

> =================================

inlpsquick <- GWASinlps(y=y, x=x, family="binomial", method="quick", mmlexy=mmle_xy, prior="mom", tau=0.2, k0=1, m=50, rxx=0.2)

> =================================

> Number of selected variables: 8

> Time taken: 0 min

> =================================

Lasso analysis

library(glmnet) fit.cvlasso = cv.glmnet( x, y, family = "binomial", alpha = 1 ) l.min = fit.cvlasso $lambda.min # lambda that gives minimum cvm l.1se = fit.cvlasso $lambda.1se # largest lambda such that error is

within 1 se of the minimum

lassomin = which( as.vector( coef( fit.cvlasso, s = l.min ) )[-1] != 0 ) lasso1se = which( as.vector( coef( fit.cvlasso, s = l.1se ) )[-1] != 0 )

Compare results

library(kableExtra) res = matrix(nrow=4,ncol=3) res[1,] = c(length(inlpsrigorous$selected), length(intersect(inlpsrigorous$selected, causalsnps)), length(setdiff(causalsnps, inlpsrigorous$selected)) ) res[2,] = c(length(inlpsquick$selected), length(intersect(inlpsquick$selected, causalsnps)), length(setdiff(causalsnps, inlpsquick$selected)) ) res[3,] = c(length(lassomin), length(intersect(lassomin, causalsnps)), length(setdiff(causalsnps, lassomin))) res[4,] = c(length(lasso1se), length(intersect(lasso1se, causalsnps)), length(setdiff(causalsnps, lasso1se))) colnames(res) = c("#Selected SNPs","#True positive","#False negative") rownames(res) = c("GWASinlps rigorous", "GWASinlps quick", "LASSO min", "LASSO 1se")

kableExtra::kable(res, format="html", table.attr= "style='width:60%;'", caption=paste("

Variable selection from", p, " SNPs with", m, " causal SNPs for binary phenotypes from", n, " subjects
"), escape=FALSE) %>% kableExtra::kable_styling() ```

Variable selection from 2000 SNPs with 10 causal SNPs for binary phenotypes from 500 subjects
\#Selected SNPs \#True positive \#False negative
GWASinlps rigorous 4 4 6
GWASinlps quick 8 4 6
LASSO min 20 5 5
LASSO 1se 6 4 6

References:

Nilotpal Sanyal, Min-Tzu Lo, Karolina Kauppi, Srdjan Djurovic, Ole A. Andreassen, Valen E. Johnson, and Chi-Hua Chen. “GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies.” Bioinformatics 35, no. 1 (2019): 1-11. https://doi.org/10.1093/bioinformatics/bty472

Nilotpal Sanyal. “Iterative variable selection for high-dimensional data with binary outcomes.” arXiv preprint arXiv:2211.03190 (2022). https://arxiv.org/pdf/2211.03190.pdf

Owner

  • Name: Nilotpal Sanyal
  • Login: nilotpalsanyal
  • Kind: user

I'm an Assistant Professor at the Department of Mathematical Sciences at the University of Texas at El Paso. Find here a few R packages I've written.

GitHub Events

Total
Last Year

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 21
  • Total Committers: 1
  • Avg Commits per committer: 21.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
GWASinlps n****l@g****m 21

Packages

  • Total packages: 1
  • Total downloads:
    • cran 265 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
cran.r-project.org: GWASinlps

Non-Local Prior Based Iterative Variable Selection Tool for Genome-Wide Association Studies

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 265 Last month
Rankings
Forks count: 21.9%
Dependent packages count: 29.8%
Average: 33.2%
Stargazers count: 35.2%
Dependent repos count: 35.5%
Downloads: 43.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • fastglm * depends
  • mombf * depends
  • Rcpp >= 1.0.9 imports
  • RcppArmadillo * imports
  • horseshoe * imports
  • glmnet * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/check-release.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite