qsvaR

Quality Surrogate Variable Analysis for Degradation Correction

https://github.com/lieberinstitute/qsvar

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 7 committers (14.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary

Keywords

bioconductor brain degradation human qsva rstats

Keywords from Contributors

bioconductor-package derfinder rnaseq annotation-agnostic count exon gene illumina junction mouse

Last synced: 6 months ago · JSON representation

Repository

Quality Surrogate Variable Analysis for Degradation Correction

Basic Info

Host: GitHub
Owner: LieberInstitute
Language: R
Default Branch: devel
Homepage: http://research.libd.org/qsvaR/
Size: 9.62 MB

Statistics

Stars: 0
Watchers: 5
Forks: 2
Open Issues: 15
Releases: 1

Topics

bioconductor brain degradation human qsva rstats

Created over 4 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog Contributing Code of conduct Support

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/README-",
    out.width = "100%"
)
```

# qsvaR


[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![Bioc release status](http://www.bioconductor.org/shields/build/release/bioc/qsvaR.svg)](https://bioconductor.org/checkResults/release/bioc-LATEST/qsvaR)
[![Bioc devel status](http://www.bioconductor.org/shields/build/devel/bioc/qsvaR.svg)](https://bioconductor.org/checkResults/devel/bioc-LATEST/qsvaR)
[![Bioc downloads rank](https://bioconductor.org/shields/downloads/release/qsvaR.svg)](http://bioconductor.org/packages/stats/bioc/qsvaR/)
[![Bioc support](https://bioconductor.org/shields/posts/qsvaR.svg)](https://support.bioconductor.org/tag/qsvaR)
[![Bioc history](https://bioconductor.org/shields/years-in-bioc/qsvaR.svg)](https://bioconductor.org/packages/release/bioc/html/qsvaR.html#since)
[![Bioc last commit](https://bioconductor.org/shields/lastcommit/devel/bioc/qsvaR.svg)](http://bioconductor.org/checkResults/devel/bioc-LATEST/qsvaR/)
[![Bioc dependencies](https://bioconductor.org/shields/dependencies/release/qsvaR.svg)](https://bioconductor.org/packages/release/bioc/html/qsvaR.html#since)
[![Codecov test coverage](https://codecov.io/gh/LieberInstitute/qsvaR/branch/devel/graph/badge.svg)](https://codecov.io/gh/LieberInstitute/qsvaR?branch=devel)
[![R build status](https://github.com/LieberInstitute/qsvaR/actions/workflows/check-bioc.yml/badge.svg)](https://github.com/LieberInstitute/qsvaR/actions/workflows/check-bioc.yml)
[![GitHub issues](https://img.shields.io/github/issues/LieberInstitute/qsvaR)](https://github.com/LieberInstitute/qsvaR/issues)
[![GitHub pulls](https://img.shields.io/github/issues-pr/LieberInstitute/qsvaR)](https://github.com/LieberInstitute/qsvaR/pulls)
[![DOI](https://zenodo.org/badge/421556636.svg)](https://zenodo.org/badge/latestdoi/421556636)


Differential expressions analysis requires the ability to normalize complex datasets. In the case of postmortem brain tissue we are tasked with removing the effects of bench degradation. The `qsvaR` package combines an established method for removing the effects of degradation from RNA-seq data with easy to use functions. It is the second iteration of the qSVA framework ([Jaffe et al, PNAS, 2017](https://doi.org/10.1073/pnas.1617384114)).

The first step in the `qsvaR` workflow is to create an [`RangedSummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) object with the transcripts identified in our qSVA experiment. If you already have a [`RangedSummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) of transcripts we can do this with the `getDegTx()` function as shown below.If not this can be generated with the [`SPEAQeasy`](http://research.libd.org/SPEAQeasy/index.html) (a RNA-seq pipeline maintained by our lab) pipeline using the `--qsva` flag. If you already have a [`RangedSummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) object with transcripts then you do not need to run [`SPEAQeasy`](http://research.libd.org/SPEAQeasy/index.html). This flag requires a full path to a text file, containing one Ensembl transcript ID per line for each transcript desired in the final transcripts R output object (called `rse_tx`). The `sig_transcripts` argument in this package should contain the same Ensembl transcript IDs as the text file for the `--qsva` flag.The goal of `qsvaR` is to provide software that can remove the effects of bench degradation from RNA-seq data.

## Installation Instructions

Get the latest stable R release from CRAN. Then install `qsvaR` using from Bioconductor the following code:

```{r "install pkg", eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("qsvaR")
```

And the development version from GitHub with:

```{r "install qsva devel", eval=FALSE}
BiocManager::install("LieberInstitute/qsvaR")
```


## Example

This is a basic example which shows how to obtain the quality surrogate variables (qSVs) for the brainseq [phase II dataset](http://eqtl.brainseq.org/phase2). qSVs are essentially principal components from an rna-seq experiment designed to model bench degradation. For more on principal components you can read and introductory article [here](https://towardsdatascience.com/tidying-up-with-pca-an-introduction-to-principal-components-analysis-f876599af383#:~:text=The%20goal%20of%20PCA%20is,eliminate%20ones%20that%20do%20not). At the start of this script we will have an [`RangedSummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) and a list of all the transcripts found in our degradation study. At the end we will have a table with differential expression results that is adjusted for qSVs.

```{r "load pkgs", message = FALSE, warning = FALSE}
## R packages we'll use
library("qsvaR")
library("limma")
```


```{r example_qsvs}
library("qsvaR")

## We'll download example data from the BrainSeq Phase II project
## described at http://eqtl.brainseq.org/phase2/.
##
## We'll use BiocFileCache to cache these files so you don't have to download
## them again for other examples.
bfc <- BiocFileCache::BiocFileCache()
rse_file <- BiocFileCache::bfcrpath(
    "https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata",
    x = bfc
)

## Now that we have the data in our computer, we can load it.
load(rse_file, verbose = TRUE)
```


In this next step, we subset to the transcripts associated with degradation.
`qsvaR` provides significant transcripts determined in four different linear
models of transcript expression against degradation time, brain region, and
potentially cell-type proportions:

1. `exp ~ DegradationTime + Region`
2. `exp ~ DegradationTime * Region`
3. `exp ~ DegradationTime + Region + CellTypeProp`
4. `exp ~ DegradationTime * Region + CellTypeProp`

`select_transcripts()` returns degradation-associated transcripts and supports
two parameters. First, `top_n` controls how many significant transcripts to
extract from each model. When `cell_component = TRUE`, all four models are used;
otherwise, just the first two are used. The union of significant transcripts
from all used models is returned.

As an example, we'll subset our `RangedSummarizedExperiment` to the union of
the top 1000 significant transcripts derived from each of the four models.

```{r select_transcripts}
#   Subset 'rse_tx' to the top 1000 significant transcripts from the four
#   degradation models
DegTx <- getDegTx(
    rse_tx,
    sig_transcripts = select_transcripts(top_n = 1000, cell_component = TRUE)
)

## Now we can compute the Principal Components (PCs) of the degraded
## transcripts
pcTx <- getPCs(DegTx, "tpm")
```

Next we use the `k_qsvs()` function to calculate how many PCs we will need to account for the variation. A model matrix accounting for relevant variables should be used. Common variables such as Age, Sex, Race and Region are often included in the model. Again we are using our `RangedSummarizedExperiment` `DegTx` as the `rse_tx` option. Next we specify the `mod` with our `model.matrix()`. `model.matrix()` creates a design (or model) matrix, e.g., by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly. For more information on creating a design matrix for your experiment see the documentation [here](http://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html). Again we use the `assayname` option to specify the we are using the `tpm` assay, where TPM stands for _transcripts per million_.

```{r select_k}
## Using a simple statistical model we determine the number of PCs needed (k)
mod <- model.matrix(~ Dx + Age + Sex + Race + Region,
    data = colData(rse_tx)
)
k <- k_qsvs(DegTx, mod, "tpm")
print(k)
```

Now that we have our PCs and the number we need we can generate our qSVs.

```{r example_get_qSVs}
## Obtain the k qSVs
qsvs <- get_qsvs(pcTx, k)
dim(qsvs)
```

This can be done in one step with our wrapper function `qSVA` which just combinds all the previous mentioned functions.

```{r "wrapper function"}
## Example use of the wrapper function qSVA()
qsvs_wrapper <- qSVA(
    rse_tx = rse_tx,
    sig_transcripts = select_transcripts(top_n = 1000, cell_component = TRUE),
    mod = mod,
    assayname = "tpm"
)
dim(qsvs_wrapper)
```

## Differential Expression

Next we can use a standard `limma` package approach to do differential expression on the data. The key here is that we add our qSVs to the statistical model we use through `model.matrix()`. 
Here we input our [`Ranged SummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) object and our `model.matrix` with qSVs. Note here that the `Ranged SummarizedExperiment` object is the original object loaded with the full list of transcripts, not the the one we subsetted for qSVs. This is because while PCs can be generated from a subset of genes, differential expression is best done on the full dataset. The expected output is a `sigTx` object that shows the results of differential expression.

```{r "perform DE", warning=FALSE}
library("limma")

## Add the qSVs to our statistical model
mod_qSVA <- cbind(
    mod,
    qsvs
)

## Extract the transcript expression values and put them in the
## log2(TPM + 1) scale
txExprs <- log2(assays(rse_tx)$tpm + 1)

## Run the standard linear model for differential expression
fitTx <- lmFit(txExprs, mod_qSVA)
eBTx <- eBayes(fitTx)

## Extract the differential expression results
sigTx <- topTable(eBTx,
    coef = 2,
    p.value = 1, number = nrow(rse_tx)
)

## Explore the top results
head(sigTx)
```

Finally, you can compare the resulting t-statistics from your differential expression model against the degradation time t-statistics adjusting for the six different brain regions. This type of plot is called `DEqual` plot and was shown in the initial qSVA framework paper ([Jaffe et al, PNAS, 2017](https://doi.org/10.1073/pnas.1617384114)). We are really looking for two patterns exemplified here in Figure 1 (cartoon shown earlier). A direct positive correlation with degradation shown in Figure 1 on the right tells us that there is signal in the data associated with qSVs. An example of nonconfounded data or data that has been modeled can be seen in Figure 1 on the right with its lack of relationship between the x and y variables.

```{r DEqualCartoon,fig.cap="Cartoon showing patterns in DEqual plots", echo = FALSE}
knitr::include_graphics("./man/figures/DEqual_example.png")
```

```{r "DEqual",fig.cap="Result of Differential Expression with qSVA normalization."}
## Generate a DEqual() plot using the model results with qSVs
DEqual(sigTx)
```

For comparison, here is the `DEqual()` plot for the model without qSVs.

```{r "DEqual-no-qSVs",fig.cap="Result of Differential Expression without qSVA normalization.", warning=FALSE}
## Generate a DEqual() plot using the model results without qSVs
DEqual(topTable(eBayes(lmFit(txExprs, mod)), coef = 2, p.value = 1, number = nrow(rse_tx)))
```

In these two DEqual plots we can see that the first is much better. With a correlation of -0.014 we can effectively conclude that we have removed the effects of degradation from the data. In the second plot after modeling for several common variables we still have a correlation of 0.5 with the degradation experiment. This high correlation shows we still have a large amount of signal from degradation in our data potentially confounding our case-control (SCZD vs neurotypical controls) differential expression results.

Owner

Name: Lieber Institute for Brain Development
Login: LieberInstitute
Kind: organization
Email: info@libd.org
Location: Baltimore, MD

Website: www.libd.org
Repositories: 40
Profile: https://github.com/LieberInstitute

GitHub Events

Total

Push event: 14

Last Year

Push event: 14

Committers

Last synced: 8 months ago

All Time

Total Commits: 382
Total Committers: 7
Avg Commits per committer: 54.571
Development Distribution Score (DDS): 0.573

Past Year

Commits: 41
Committers: 4
Avg Commits per committer: 10.25
Development Distribution Score (DDS): 0.537

Top Committers

Name	Email	Commits
joshstolz	j**z@l**g	163
HediaTnani	h**0@g**m	121
Leonardo Collado Torres	l**r@g**m	50
Geo Pertea	g**a@g**m	19
Nick-Eagles	n**s@l**g	17
J Wokaty	j**y@s**u	10
Nitesh Turaga	n**a@g**m	2

Committer Domains (Top 20 + Academic)

libd.org: 2 sph.cuny.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 44
Total pull requests: 1
Average time to close issues: 30 days
Average time to close pull requests: 2 days
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.8
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 2 days
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

lcolladotor (32)
joshstolz (11)

Pull Request Authors

gpertea (2)

Top Labels

Issue Labels

enhancement (14)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- bioconductor 7,836 total

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

bioconductor.org: qsvaR

Generate Quality Surrogate Variable Analysis for Degradation Correction

Homepage: https://github.com/LieberInstitute/qsvaR
Documentation: https://bioconductor.org/packages/release/bioc/vignettes/qsvaR/inst/doc/qsvaR.pdf
License: Artistic-2.0
Latest release: 1.12.0
published 10 months ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 7,836 Total

Rankings

Dependent repos count: 0.0%

Dependent packages count: 0.0%

Average: 31.2%

Downloads: 93.5%

Maintainers (1)

hediatnani0@gmail.com

Last synced: 6 months ago

Dependencies

.github/workflows/check-bioc.yml actions

JamesIves/github-pages-deploy-action releases/v4 composite
actions/cache v3 composite
actions/checkout v3 composite
actions/upload-artifact master composite
docker/build-push-action v4 composite
docker/login-action v2 composite
docker/setup-buildx-action v2 composite
docker/setup-qemu-action v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite

DESCRIPTION cran

R >= 4.2 depends
SummarizedExperiment * depends
ggplot2 * imports
methods * imports
stats * imports
sva * imports
BiocFileCache * suggests
BiocStyle * suggests
RefManageR * suggests
covr * suggests
knitr * suggests
limma * suggests
rmarkdown * suggests
sessioninfo * suggests
testthat >= 3.0.0 suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

qsvaR

Science Score: 59.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

bioconductor.org: qsvaR

Rankings

Maintainers (1)

Dependencies