pcsstools

Tools for regression using pre-computed summary statistics

https://github.com/jackmwolf/pcsstools

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 12 DOI reference(s) in README
○
Academic publication links
✓
Committers with academic emails
1 of 3 committers (33.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (19.9%) to scientific vocabulary

Keywords

gwas r statistical-genetics

Last synced: 7 months ago · JSON representation

Repository

Tools for regression using pre-computed summary statistics

Basic Info

Host: GitHub
Owner: jackmwolf
License: gpl-3.0
Language: R
Default Branch: master
Homepage:
Size: 1.32 MB

Statistics

Stars: 5
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 1

Topics

gwas r statistical-genetics

Created over 6 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```
# pcsstools  


[![CRAN status](https://www.r-pkg.org/badges/version/pcsstools)](https://CRAN.R-project.org/package=pcsstools)
![CRAN Downloads](https://cranlogs.r-pkg.org/badges/grand-total/pcsstools)
[![R-CMD-check](https://github.com/jackmwolf/pcsstools/workflows/R-CMD-check/badge.svg)](https://github.com/jackmwolf/pcsstools/actions)


## Overview 
pcsstools is an R package to describe various regression models using only pre-computed summary statistics (PCSS) from genome-wide association studies (GWASs) and PCSS repositories such as [GeneAtlas](http://geneatlas.roslin.ed.ac.uk/).
This eliminates the logistic, privacy, and access concerns that accompany the use of individual patient-level data (IPD).


The following figure highlights the information typically needed to perform regression analysis on a set of $m$ phenotypes with $p$ covariates when IPD is available, and the PCSS that are commonly needed to approximate this same model in pcsstools.

![Data needed for analysis using IPD compared to that when using PCSS](./man/figures/IPDvsPCSS.png)

Currently, pcsstools supports the linear modeling of complex phenotypes defined via functions of other phenotypes.
Supported functions include:

* linear combinations (e.g. $\phi_1y_1 + \phi_2y_2$)
* products (e.g. $y_1\circ y_2$)
* logical combinations (e.g. $y_1\wedge y_2$ or $y_1\vee y_2$)


## Installation

You can install pcsstools from CRAN with

``` r
install.packages("pcsstools")
```



### Development Version
You can install the in-development version of pcsstools from [GitHub](https://github.com/) with

``` r
# install.packages("devtools")
devtools::install_github("jackmwolf/pcsstools")
```
## Examples

We will walk through two examples using pcsstools to model combinations of phenotypes using PCSS and then compare our results to those found using IPD.

```{r}
library(pcsstools)
```


### Principal Component Analysis

Let's model the first principal component score of three phenotypes using PCSS.

First, we'll load in some data. We have three SNPs; minor allele counts (`g1`, `g2`, and `g3`), a continuous covariate (`x1`), and three continuous phenotypes (`y1`, `y2`, and `y3`).

```{r}
dat <- pcsstools_example[c("g1", "g2", "g3", "x1", "y1", "y2", "y3")]
head(dat)
```

First, we need our assumed summary statistics: means, the full covariance matrix, and our sample size.

```{r}
pcss <- list(
  means = colMeans(dat),
  covs  = cov(dat),
  n     = nrow(dat)
)
```

Then, we can calculate the linear model by using `pcsslm()`.
Our `formula` will list all phenotypes as one sum, joined together by `+` operators and we indicate that we want the first principal component score by setting `comp = 1`.
We also want to center and standardize `y1`, `y2`, and `y3` before computing principal component scores; we will do so by setting `center = TRUE` and `standardize = TRUE`.

```{r}
model_pcss <- pcsslm(y1 + y2 + y3 ~ g1 + g2 + g3 + x1, pcss = pcss, comp = 1,
                     center = TRUE, standardize = TRUE)
model_pcss
```

Here's the same model using individual patient data. 

```{r}
pc_1 <- prcomp(x = dat[c("y1", "y2", "y3")], center = TRUE, scale. = TRUE)$x[, "PC1"]

model_ipd <- lm(pc_1 ~ g1 + g2 + g3 + x1, data = dat)
summary(model_ipd)
```

In this case, our coefficient estimates are off by a factor of -1; this is because we picked the opposite vector of principal component weights to `prcomp`.
This distinction in sign is arbitrary (see the note in `?prcomp`).

We can also compare this model to a smaller model using `anova` and find the same results when using both PCSS and IPD.

```{r}
model_pcss_reduced <- update(model_pcss, . ~ . - g1 - g2 - g3)
anova(model_pcss_reduced, model_pcss)

model_ipd_reduced <-update(model_ipd, . ~ . - g1 - g2 - g3)
anova(model_ipd_reduced, model_ipd)
```


### Logical Combination

In this example we will approximate a linear model where our response is the logical combination "$y_4$ or $y_5$" ($y_4\vee y_5$).

First we need data with binary phenotypes.

```{r}
dat <- pcsstools_example[c("g1", "g2", "x1", "y4", "y5")]
head(dat)
```

Once again we will organized our assumed PCSS.
In addition to the summary statistics we needed for the previous example, we also need to describe the distributions of both of our predictors through objects of class `predictor`. 
(See `?new_predictor`.)
`pcsstools` has shortcut functions to create `predictor` objects for common types of variables, which we will use to create a list of `predictor`s.

```{r}
pcss <- list(
 means = colMeans(dat),
 covs = cov(dat),
 n = nrow(dat),
 predictors = list(
   g1 = new_predictor_snp(maf = mean(dat$g1) / 2),
   g2 = new_predictor_snp(maf = mean(dat$g2) / 2),
   x1 = new_predictor_normal(mean = mean(dat$x1), sd = sd(dat$x1))
 )
)

class(pcss$predictors[[1]])
```

Then we can approximate the linear model using `pcsslm()`.

```{r}
model_pcss <- pcsslm(y4 | y5 ~ g1 + g2 + x1, pcss = pcss) 
model_pcss
```

And here's the result we would get using IPD:

```{r}
model_ipd <- lm(y4 | y5 ~ g1 + g2 + x1, data = dat)
summary(model_ipd)
```

## Future Work

* Support function notation for linear combinations of phenotypes (e.g. `y1 - y2 + 0.5 * y3 ~ 1 + g + x`)  instead of requiring a separate vector of weights

* Support functions using `.` and `-` in the dependent variable (e.g. `y1 ~ .`, `y1 ~ . -x`)

* Write a vignette


## References
Following are the key references for the functions in this package

* Wolf, J.M., Westra, J., and Tintle, N. (2021). Using summary statistics to 
  model multiplicative combinations of initially analyzed phenotypes with a 
  flexible choice of covariates. *Frontiers in Genetics*, 25, 1962.
  [https://doi.org/10.3389/fgene.2021.745901](https://doi.org/10.3389/fgene.2021.745901).

* Wolf, J.M., Barnard, M., Xueting, X., Ryder, N., Westra, J., and Tintle, N. 
  (2020). Computationally efficient, exact, covariate-adjusted genetic principal
  component analysis by leveraging individual marker summary statistics from 
  large biobanks. *Pacific Symposium on Biocomputing*, 25, 719-730. 
  [https://doi.org/10.1142/9789811215636_0063](https://doi.org/10.1142/9789811215636_0063).
  
* Gasdaska A., Friend D., Chen R., Westra J., Zawistowski M., Lindsey W. and 
  Tintle N. (2019) Leveraging summary statistics to make inferences about 
  complex phenotypes in large biobanks. *Pacific Symposium on Biocomputing*, 24, 
  391-402.
  [https://doi.org/10.1142/9789813279827_0036](https://doi.org/10.1142/9789813279827_0036).

Owner

Name: Jack M Wolf
Login: jackmwolf
Kind: user
Company: University of Minnesota Biostatistics

Website: jackmwolf.rbind.io
Twitter: _jackmwolf
Repositories: 4
Profile: https://github.com/jackmwolf

he/him \\ Biostatistics PhD Student

GitHub Events

Total

Last Year

Committers

Last synced: about 3 years ago

All Time

Total Commits: 173
Total Committers: 3
Avg Commits per committer: 57.667
Development Distribution Score (DDS): 0.127

Top Committers

Name	Email	Commits
Jack M Wolf	3**f@u**m	151
Jack Wolf	=	17
Jack Wolf	w**5@s**u	5

Committer Domains (Top 20 + Academic)

stolaf.edu: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 6
Total pull requests: 3
Average time to close issues: 3 months
Average time to close pull requests: about 1 hour
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.17
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jackmwolf (5)

Pull Request Authors

jackmwolf (3)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 185 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

cran.r-project.org: pcsstools

Tools for Regression Using Pre-Computed Summary Statistics

Homepage: https://github.com/jackmwolf/pcsstools/
Documentation: http://cran.r-project.org/web/packages/pcsstools/pcsstools.pdf
License: GPL (≥ 3)
Latest release: 0.1.2
published over 2 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 185 Last month

Rankings

Stargazers count: 24.2%

Forks count: 28.8%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Average: 41.3%

Downloads: 88.1%

Maintainers (1)

jackwolf910@gmail.com

Last synced: 7 months ago

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
Rdpack * imports
gtools * imports
stats * imports
knitr * suggests
rmarkdown * suggests
spelling * suggests
testthat * suggests

.github/workflows/check-standard.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/upload-artifact main composite
r-lib/actions/setup-pandoc v1 composite
r-lib/actions/setup-r v1 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

pcsstools

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: pcsstools

Rankings

Maintainers (1)

Dependencies