https://github.com/animesh/bgwas
R package to perform Bayesian Genome-Wide Association Studies
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org, ncbi.nlm.nih.gov -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
R package to perform Bayesian Genome-Wide Association Studies
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of n-mounier/bGWAS
Created over 3 years ago
· Last pushed over 3 years ago
https://github.com/animesh/bGWAS/blob/master/
[](https://travis-ci.org/n-mounier/bGWAS) [](https://github.com/n-mounier/bGWAS) [](https://www.tidyverse.org/lifecycle/#maturing) [](https://github.com/n-mounier/bGWAS/commits/master) [](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html) # bGWAS:arrow\_right: ESHG poster is available [here](doc/P17.051.A_NinonMounier.pdf). :information\_source: `bGWAS` has been updated to version 1.0.2. :warning: 28/10/2019 : The variance of the prior effects has been modified. If you used a previous version of the package, please re-run yout analysis using this new version to get more accurate results. Check the [NEWS](NEWS.md) to learn more about what has been modified\! :warning: If you downloaded the Z-Matrix files before 20/08/2019, they are now obsolete and you will not be able to use them with the newest version of the package. Note: some Prior GWASs have been removed, you can find more details [here](doc/ZMatrices.md). ## Overview bGWAS is an R-package to perform a Bayesian GWAS (Genome Wide Association Study), using summary statistics from a conventional GWAS as input. The aim of the approach is to increase power by leveraging information from related traits and by comparing the observed Z-scores from the focal phenotype (provided as input) to prior effects. These prior effects are directly estimated from publicly available GWASs (currently, a set of 38 studies, last update 20-08-2019 - hereinafter referred to as prior GWASs or risk factors). Only prior GWASs having a significant causal effect on the focal phenotype, identified using a multivariable Mendelian Randomization (MR) approach, are used to calculate the prior effects. Causal effects are estimated masking the focal chromosome to ensure independence, and the prior effects are estimated as described in the figure below.
Observed and prior effects are compared using Bayes Factors. Significance is assessed by calculating the probability of observing a value larger than the observed BF (P-value) given the prior distribution. This is done by decomposing the analytical form of the BFs and using an approximation for most BFs to make the computation faster. Prior, posterior and direct effects, alongside BFs and p-values are returned. Note that prior, posterior and direct effects are estimated on the Z-score scale, but are automatically rescaled to beta scale if possible. The principal functions available are: - **`bGWAS()`** main function that calculates prior effects from prior GWASs, compares them to observed Z-scores and returns an object of class *bGWAS* - **`list_priorGWASs()`** directly returns information about the prior GWASs that can be used to calculate prior effects - **`select_priorGWASs()`** allows a quick selection of prior GWASs (to include/exclude specific studies when calculating prior effects) - **`extract_results_bGWAS()`** returns results (prior, posterior and direct estimate / standard-error + p-value from BF for SNPs) from an object of class *bGWAS* - **`manhattan_plot_bGWAS()`** creates a Manhattan Plot from an object of class *bGWAS* - **`extract_MRcoeffs_bGWAS()`** returns multivariable MR coefficients (1 estimate using all chromosomes + 22 estimates with 1 chromosome masked) from an object of class *bGWAS* - **`coefficients_plot_bGWAS()`** creates a Coefficients Plot (causal effect of each prior GWASs on the focal phenotype) from an object of class *bGWAS* - **`heatmap_bGWAS()`** creates a heatmap to represent, for each significant SNP, the contribution of each prior GWAS to the estimated prior effect from an object of class *bGWAS* All the functions available and more details about their usage can be found in the [manual](doc/bGWAS-manual.pdf). ## Installation You can install the current version of `bGWAS` with: ``` r # Directly install the package from github # install.packages("remotes") remotes::install_github("n-mounier/bGWAS") library(bGWAS) ``` ## Warning: package 'dplyr' was built under R version 3.6.2 ## Usage To run the analysis with `bGWAS` two inputs are needed: #### 1\. The *GWAS* results to be tested Can be a regular (space/tab/comma-separated) file or a gzipped file (.gz) or a `data.frame`. Must contain the following columns, which can have alternative names:
-
SNP-identifier: `rs` or `rsid`, `snp`, `snpid`, `rnpid`
Alternate (effect) allele: `a1` or `alt`, `alts`
Reference allele: `a2` or `a0`, `ref`
Z-statistics: `z` or `Z`, `zscore`
-
Effect-size: `b` or `beta`, `beta1`
Standard error: `se` or `std`
-
*ZMatrix\_MR.csv.gz*: Z-scores (strong instruments only) used for
multivariable MR,
*ZMatrix\_Full.csv.gz*: Z-scores (all SNPs) used to calculate the prior
Z-scores,
*AvailableStudies.tsv*: A file containing information about the prior
GWASs available.
Show log
``` ## <<< Preparation of analysis >>> ## ## > Checking parameters ## The name of your analysis is: "Test_UsingSmallDataFrame". ## The Z-Matrix files are stored in "/Users/nmounier/ZMatrices". ## # Preparation of the data... ## The conventional GWAS used as input the object: "GWAS". ## SNPID column, ok - ALT column, ok - REF column, ok - BETA column, ok - SE column, ok ## Posterior effects will be rescaled using BETA and SE. ## The analysis will be run in the folder: "/Users/nmounier/Documents/SGG/Projects/Packaging/bGWAS". ## The p-value threshold used for selecting MR instruments is: 1e-06. ## The minimum number instruments required for each trait is: 3. ## The distance used for pruning MR instruments is: 500Kb. ## Distance-based pruning will be used for MR instruments. ## No shrinkage applied before performing MR. ## The p-value threshold used for stepwise selection is 0.05. ## Using MR_shrinkage as default for prior_shrinkage: ## No shrinkage applied before performing calculating the prior. ## The p-value threshold used for stepwise selection is 0.05. ## Significant SNPs will be identified according to p-value. The threshold used is :5e-08. ## The distance used for pruning results is: 500Kb. ## Distance-based pruning will be used for results. ## ## ## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> ## <<< Identification of significant prior GWASs for MR >>> ## ## > Creating the Z-Matrix of strong instruments ## # Loading the ZMatrix... ## Selecting studies : ## 6 studies ## 209,840 SNPs ## # Adding data from the conventional GWAS : "GWAS" ## Done! ## 8,813 SNPs in common between prior studies and the conventional GWAS ## # Thresholding... ## 767 SNPs left after thresholding ## 6 studies left after thresholding ## Pruning MR instruments... ## distance : 500Kb ## 159 SNPs left after pruning ## 6 studies left after thresholding+pruning ## ## > Performing MR ## #Preparation of the MR analyses to identify significant studies... ## Conventionnal GWAS of interest : GWAS ## # Univariable regressions for each trait... ## Number of trait-specific instruments per univariable regression: ## . Body Mass Index (GIANT) : 59 ## . Coronary Artery Disease (CARDIoGRAM) : 6 ## . Years of Schooling (SSGAC) : 83 ## . Systolic Blood Pressure (ICBP) : 8 ## . Diastolic Blood Pressure (ICBP) : 8 ## . College Completion (SSGAC) : 4 ## Done! ## # Stepwise selection (all traits)... ## Studies tested (reaching p<0.05 in univariable models) : Years of Schooling (SSGAC) Body Mass Index (GIANT) Coronary Artery Disease (CARDIoGRAM) Systolic Blood Pressure (ICBP) Diastolic Blood Pressure (ICBP) ## Adding the first study :Years of Schooling (SSGAC) ## #Test if any study can be added with p<0.05 ## Adding one study :Systolic Blood Pressure (ICBP) ## Done! ## #Test if any study has p>0.05 now ## #Test if any study can be added with p<0.05 ## Adding one study :Body Mass Index (GIANT) ## Done! ## #Test if any study has p>0.05 now ## #Test if any study can be added with p<0.05 ## Adding one study :Coronary Artery Disease (CARDIoGRAM) ## Done! ## #Test if any study has p>0.05 now ## #Test if any study can be added with p<0.05 ## Adding one study :Diastolic Blood Pressure (ICBP) ## Done! ## #Test if any study has p>0.05 now ## Excluding one study :Systolic Blood Pressure (ICBP) ## Done! ## #Test if any study can be added with p<0.05 ## #Test if any study has p>0.05 now ## It converged! ## # Final regression... ## The studies used are: ## - Years of Schooling (SSGAC)- Body Mass Index (GIANT)- Coronary Artery Disease (CARDIoGRAM)- Diastolic Blood Pressure (ICBP) ## ## Estimating adjusted R-squared: ## - in-sample adjusted R-squared for the all-chromosomes multivariable regression is 0.5534 ## - out-of-sample R-squared (masking one chromosome at a time), for the multivariable regression will be estimated when calculating the prior. ## ## ## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> ## <<< Estimation of the prior >>> ## ## > Creating the full Z-Matrix ## # Loading the ZMatrix... ## Selecting studies : ## 4 studies ## 6,811,310 SNPs ## # Adding data from the conventional GWAS : "GWAS" ## Done! ## 286,807 SNPs in common between prior studies and the conventional GWAS ## ## > Computing prior ## # Calculating the prior chromosome by chromosome... ## Chromosome 1 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 2 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 3 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 4 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 5 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 6 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 7 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 8 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 9 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 10 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 11 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 12 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 13 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 14 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 15 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 16 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 17 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 18 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 19 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 20 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 21 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## Chromosome 22 ## Running regression, ## Calculating prior estimates for SNPs on this chromosome, ## Calculating prior standard errors for SNPs on this chromosome, ## ## Out-of-sample R-squared for MR instruments across all chromosomes is 0.5206 ## ## Out-of-sample squared correlation for MR instruments across all chromosome is 0.5271 ## ## Correlation between prior and observed effects for all SNPs is 0.1855 ## ## Correlation between prior and observed effects for SNPs with GWAS p-value < 0.001 is 0.5944 ## Done! ## ## ## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> ## <<< Calculation of Bayes Factors and p-values >>> ## ## > Calculating them for all SNPs ## # Computing observed Bayes Factor for all SNPs... ## Done! ## # Computing BF p-values... ## using a distribution approach: ## ... getting approximated p-values using non-linear quantiles ## ... checking p-values near significance threshold ## everything is ok! ## # Estimating p-values for posterior effects... ## Done! ## # Estimating p-values for direct effects... ## Done! ## > Pruning and identifying significant SNPs ## Identification based on BFs ## Starting with 286,807 SNPs ## # Selecting significant SNPs according to p-values... ## 30 SNPs left ## Done! ## # Pruning significant SNPs... ## distance : 500Kb ## 14 SNPs left ## Done! ## Identification based on posterior effects ## Starting with 286,807 SNPs ## # Selecting significant SNPs according to p-values... ## 44 SNPs left ## Done! ## # Pruning significant SNPs... ## distance : 500Kb ## 17 SNPs left ## Done! ## Identification based on direct effects ## Starting with 286,807 SNPs ## # Selecting significant SNPs according to p-values... ## 4 SNPs left ## Done! ## # Pruning significant SNPs... ## distance : 500Kb ## 2 SNPs left ## Done! ## ## ## <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> ## Time of the analysis: 2 minute(s) and 16 second(s). ```
``` r
# Manhattan plot using BFs p-values
manhattan_plot_bGWAS(A)
```
``` r
# Manhattan plot using posterior p-values
manhattan_plot_bGWAS(A, results="posterior")
```
##### Aditionnaly, if `save_files=TRUE`, several files are created in the folder `./Owner
- Name: Ani
- Login: animesh
- Kind: user
- Location: Norway
- Company: Norwegian University of Science and Technology
- Website: https://www.fuzzylife.org
- Twitter: animesh1977
- Repositories: 749
- Profile: https://github.com/animesh
A medical graduate from Delhi University with post-graduation in bioinformatics from Jawaharlal Nehru University, India.
:arrow\_right: ESHG poster is available
[here](doc/P17.051.A_NinonMounier.pdf).
:information\_source: `bGWAS` has been updated to version 1.0.2.
:warning: 28/10/2019 : The variance of the prior effects has been
modified. If you used a previous version of the package, please re-run
yout analysis using this new version to get more accurate results.
Check the [NEWS](NEWS.md) to learn more about what has been modified\!
:warning: If you downloaded the Z-Matrix files before 20/08/2019, they
are now obsolete and you will not be able to use them with the newest
version of the package.
Note: some Prior GWASs have been removed, you can find more details
[here](doc/ZMatrices.md).
## Overview
bGWAS is an R-package to perform a Bayesian GWAS (Genome Wide
Association Study), using summary statistics from a conventional GWAS as
input. The aim of the approach is to increase power by leveraging
information from related traits and by comparing the observed Z-scores
from the focal phenotype (provided as input) to prior effects. These
prior effects are directly estimated from publicly available GWASs
(currently, a set of 38 studies, last update 20-08-2019 - hereinafter
referred to as prior GWASs or risk factors). Only prior GWASs having
a significant causal effect on the focal phenotype, identified using a
multivariable Mendelian Randomization (MR) approach, are used to
calculate the prior effects. Causal effects are estimated masking the
focal chromosome to ensure independence, and the prior effects are
estimated as described in the figure below.
Observed and prior effects are compared using Bayes Factors.
Significance is assessed by calculating the probability of observing a
value larger than the observed BF (P-value) given the prior
distribution. This is done by decomposing the analytical form of the BFs
and using an approximation for most BFs to make the computation faster.
Prior, posterior and direct effects, alongside BFs and p-values are
returned. Note that prior, posterior and direct effects are estimated on
the Z-score scale, but are automatically rescaled to beta scale if
possible.
The principal functions available are:
- **`bGWAS()`**
main function that calculates prior effects from prior GWASs,
compares them to observed Z-scores and returns an object of class
*bGWAS*
- **`list_priorGWASs()`**
directly returns information about the prior GWASs that can be used
to calculate prior effects
- **`select_priorGWASs()`**
allows a quick selection of prior GWASs (to include/exclude specific
studies when calculating prior effects)
- **`extract_results_bGWAS()`**
returns results (prior, posterior and direct estimate /
standard-error + p-value from BF for SNPs) from an object of class
*bGWAS*
- **`manhattan_plot_bGWAS()`**
creates a Manhattan Plot from an object of class *bGWAS*
- **`extract_MRcoeffs_bGWAS()`**
returns multivariable MR coefficients (1 estimate using all
chromosomes + 22 estimates with 1 chromosome masked) from an object
of class *bGWAS*
- **`coefficients_plot_bGWAS()`**
creates a Coefficients Plot (causal effect of each prior GWASs on
the focal phenotype) from an object of class *bGWAS*
- **`heatmap_bGWAS()`**
creates a heatmap to represent, for each significant SNP, the
contribution of each prior GWAS to the estimated prior effect from
an object of class *bGWAS*
All the functions available and more details about their usage can be
found in the [manual](doc/bGWAS-manual.pdf).
## Installation
You can install the current version of `bGWAS` with:
``` r
# Directly install the package from github
# install.packages("remotes")
remotes::install_github("n-mounier/bGWAS")
library(bGWAS)
```
## Warning: package 'dplyr' was built under R version 3.6.2
## Usage
To run the analysis with `bGWAS` two inputs are needed:
#### 1\. The *GWAS* results to be tested
Can be a regular (space/tab/comma-separated) file or a gzipped file
(.gz) or a `data.frame`. Must contain the following columns, which can
have alternative names: