qsvaR
Quality Surrogate Variable Analysis for Degradation Correction
Science Score: 59.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
1 of 7 committers (14.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Keywords
bioconductor
brain
degradation
human
qsva
rstats
Keywords from Contributors
bioconductor-package
derfinder
rnaseq
annotation-agnostic
count
exon
gene
illumina
junction
mouse
Last synced: 6 months ago
·
JSON representation
Repository
Quality Surrogate Variable Analysis for Degradation Correction
Basic Info
- Host: GitHub
- Owner: LieberInstitute
- Language: R
- Default Branch: devel
- Homepage: http://research.libd.org/qsvaR/
- Size: 9.62 MB
Statistics
- Stars: 0
- Watchers: 5
- Forks: 2
- Open Issues: 15
- Releases: 1
Topics
bioconductor
brain
degradation
human
qsva
rstats
Created over 4 years ago
· Last pushed about 1 year ago
Metadata Files
Readme
Changelog
Contributing
Code of conduct
Support
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# qsvaR
[](https://lifecycle.r-lib.org/articles/stages.html#stable)
[](https://bioconductor.org/checkResults/release/bioc-LATEST/qsvaR)
[](https://bioconductor.org/checkResults/devel/bioc-LATEST/qsvaR)
[](http://bioconductor.org/packages/stats/bioc/qsvaR/)
[](https://support.bioconductor.org/tag/qsvaR)
[](https://bioconductor.org/packages/release/bioc/html/qsvaR.html#since)
[](http://bioconductor.org/checkResults/devel/bioc-LATEST/qsvaR/)
[](https://bioconductor.org/packages/release/bioc/html/qsvaR.html#since)
[](https://codecov.io/gh/LieberInstitute/qsvaR?branch=devel)
[](https://github.com/LieberInstitute/qsvaR/actions/workflows/check-bioc.yml)
[](https://github.com/LieberInstitute/qsvaR/issues)
[](https://github.com/LieberInstitute/qsvaR/pulls)
[](https://zenodo.org/badge/latestdoi/421556636)
Differential expressions analysis requires the ability to normalize complex datasets. In the case of postmortem brain tissue we are tasked with removing the effects of bench degradation. The `qsvaR` package combines an established method for removing the effects of degradation from RNA-seq data with easy to use functions. It is the second iteration of the qSVA framework ([Jaffe et al, PNAS, 2017](https://doi.org/10.1073/pnas.1617384114)).
The first step in the `qsvaR` workflow is to create an [`RangedSummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) object with the transcripts identified in our qSVA experiment. If you already have a [`RangedSummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) of transcripts we can do this with the `getDegTx()` function as shown below.If not this can be generated with the [`SPEAQeasy`](http://research.libd.org/SPEAQeasy/index.html) (a RNA-seq pipeline maintained by our lab) pipeline using the `--qsva` flag. If you already have a [`RangedSummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) object with transcripts then you do not need to run [`SPEAQeasy`](http://research.libd.org/SPEAQeasy/index.html). This flag requires a full path to a text file, containing one Ensembl transcript ID per line for each transcript desired in the final transcripts R output object (called `rse_tx`). The `sig_transcripts` argument in this package should contain the same Ensembl transcript IDs as the text file for the `--qsva` flag.The goal of `qsvaR` is to provide software that can remove the effects of bench degradation from RNA-seq data.
## Installation Instructions
Get the latest stable R release from CRAN. Then install `qsvaR` using from Bioconductor the following code:
```{r "install pkg", eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("qsvaR")
```
And the development version from GitHub with:
```{r "install qsva devel", eval=FALSE}
BiocManager::install("LieberInstitute/qsvaR")
```
## Example
This is a basic example which shows how to obtain the quality surrogate variables (qSVs) for the brainseq [phase II dataset](http://eqtl.brainseq.org/phase2). qSVs are essentially principal components from an rna-seq experiment designed to model bench degradation. For more on principal components you can read and introductory article [here](https://towardsdatascience.com/tidying-up-with-pca-an-introduction-to-principal-components-analysis-f876599af383#:~:text=The%20goal%20of%20PCA%20is,eliminate%20ones%20that%20do%20not). At the start of this script we will have an [`RangedSummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) and a list of all the transcripts found in our degradation study. At the end we will have a table with differential expression results that is adjusted for qSVs.
```{r "load pkgs", message = FALSE, warning = FALSE}
## R packages we'll use
library("qsvaR")
library("limma")
```
```{r example_qsvs}
library("qsvaR")
## We'll download example data from the BrainSeq Phase II project
## described at http://eqtl.brainseq.org/phase2/.
##
## We'll use BiocFileCache to cache these files so you don't have to download
## them again for other examples.
bfc <- BiocFileCache::BiocFileCache()
rse_file <- BiocFileCache::bfcrpath(
"https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata",
x = bfc
)
## Now that we have the data in our computer, we can load it.
load(rse_file, verbose = TRUE)
```
In this next step, we subset to the transcripts associated with degradation.
`qsvaR` provides significant transcripts determined in four different linear
models of transcript expression against degradation time, brain region, and
potentially cell-type proportions:
1. `exp ~ DegradationTime + Region`
2. `exp ~ DegradationTime * Region`
3. `exp ~ DegradationTime + Region + CellTypeProp`
4. `exp ~ DegradationTime * Region + CellTypeProp`
`select_transcripts()` returns degradation-associated transcripts and supports
two parameters. First, `top_n` controls how many significant transcripts to
extract from each model. When `cell_component = TRUE`, all four models are used;
otherwise, just the first two are used. The union of significant transcripts
from all used models is returned.
As an example, we'll subset our `RangedSummarizedExperiment` to the union of
the top 1000 significant transcripts derived from each of the four models.
```{r select_transcripts}
# Subset 'rse_tx' to the top 1000 significant transcripts from the four
# degradation models
DegTx <- getDegTx(
rse_tx,
sig_transcripts = select_transcripts(top_n = 1000, cell_component = TRUE)
)
## Now we can compute the Principal Components (PCs) of the degraded
## transcripts
pcTx <- getPCs(DegTx, "tpm")
```
Next we use the `k_qsvs()` function to calculate how many PCs we will need to account for the variation. A model matrix accounting for relevant variables should be used. Common variables such as Age, Sex, Race and Region are often included in the model. Again we are using our `RangedSummarizedExperiment` `DegTx` as the `rse_tx` option. Next we specify the `mod` with our `model.matrix()`. `model.matrix()` creates a design (or model) matrix, e.g., by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly. For more information on creating a design matrix for your experiment see the documentation [here](http://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html). Again we use the `assayname` option to specify the we are using the `tpm` assay, where TPM stands for _transcripts per million_.
```{r select_k}
## Using a simple statistical model we determine the number of PCs needed (k)
mod <- model.matrix(~ Dx + Age + Sex + Race + Region,
data = colData(rse_tx)
)
k <- k_qsvs(DegTx, mod, "tpm")
print(k)
```
Now that we have our PCs and the number we need we can generate our qSVs.
```{r example_get_qSVs}
## Obtain the k qSVs
qsvs <- get_qsvs(pcTx, k)
dim(qsvs)
```
This can be done in one step with our wrapper function `qSVA` which just combinds all the previous mentioned functions.
```{r "wrapper function"}
## Example use of the wrapper function qSVA()
qsvs_wrapper <- qSVA(
rse_tx = rse_tx,
sig_transcripts = select_transcripts(top_n = 1000, cell_component = TRUE),
mod = mod,
assayname = "tpm"
)
dim(qsvs_wrapper)
```
## Differential Expression
Next we can use a standard `limma` package approach to do differential expression on the data. The key here is that we add our qSVs to the statistical model we use through `model.matrix()`.
Here we input our [`Ranged SummarizedExperiment`](https://www.rdocumentation.org/packages/SummarizedExperiment/versions/1.2.3/topics/RangedSummarizedExperiment-class) object and our `model.matrix` with qSVs. Note here that the `Ranged SummarizedExperiment` object is the original object loaded with the full list of transcripts, not the the one we subsetted for qSVs. This is because while PCs can be generated from a subset of genes, differential expression is best done on the full dataset. The expected output is a `sigTx` object that shows the results of differential expression.
```{r "perform DE", warning=FALSE}
library("limma")
## Add the qSVs to our statistical model
mod_qSVA <- cbind(
mod,
qsvs
)
## Extract the transcript expression values and put them in the
## log2(TPM + 1) scale
txExprs <- log2(assays(rse_tx)$tpm + 1)
## Run the standard linear model for differential expression
fitTx <- lmFit(txExprs, mod_qSVA)
eBTx <- eBayes(fitTx)
## Extract the differential expression results
sigTx <- topTable(eBTx,
coef = 2,
p.value = 1, number = nrow(rse_tx)
)
## Explore the top results
head(sigTx)
```
Finally, you can compare the resulting t-statistics from your differential expression model against the degradation time t-statistics adjusting for the six different brain regions. This type of plot is called `DEqual` plot and was shown in the initial qSVA framework paper ([Jaffe et al, PNAS, 2017](https://doi.org/10.1073/pnas.1617384114)). We are really looking for two patterns exemplified here in Figure 1 (cartoon shown earlier). A direct positive correlation with degradation shown in Figure 1 on the right tells us that there is signal in the data associated with qSVs. An example of nonconfounded data or data that has been modeled can be seen in Figure 1 on the right with its lack of relationship between the x and y variables.
```{r DEqualCartoon,fig.cap="Cartoon showing patterns in DEqual plots", echo = FALSE}
knitr::include_graphics("./man/figures/DEqual_example.png")
```
```{r "DEqual",fig.cap="Result of Differential Expression with qSVA normalization."}
## Generate a DEqual() plot using the model results with qSVs
DEqual(sigTx)
```
For comparison, here is the `DEqual()` plot for the model without qSVs.
```{r "DEqual-no-qSVs",fig.cap="Result of Differential Expression without qSVA normalization.", warning=FALSE}
## Generate a DEqual() plot using the model results without qSVs
DEqual(topTable(eBayes(lmFit(txExprs, mod)), coef = 2, p.value = 1, number = nrow(rse_tx)))
```
In these two DEqual plots we can see that the first is much better. With a correlation of -0.014 we can effectively conclude that we have removed the effects of degradation from the data. In the second plot after modeling for several common variables we still have a correlation of 0.5 with the degradation experiment. This high correlation shows we still have a large amount of signal from degradation in our data potentially confounding our case-control (SCZD vs neurotypical controls) differential expression results.
Owner
- Name: Lieber Institute for Brain Development
- Login: LieberInstitute
- Kind: organization
- Email: info@libd.org
- Location: Baltimore, MD
- Website: www.libd.org
- Repositories: 40
- Profile: https://github.com/LieberInstitute
GitHub Events
Total
- Push event: 14
Last Year
- Push event: 14
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| joshstolz | j****z@l****g | 163 |
| HediaTnani | h****0@g****m | 121 |
| Leonardo Collado Torres | l****r@g****m | 50 |
| Geo Pertea | g****a@g****m | 19 |
| Nick-Eagles | n****s@l****g | 17 |
| J Wokaty | j****y@s****u | 10 |
| Nitesh Turaga | n****a@g****m | 2 |
Committer Domains (Top 20 + Academic)
libd.org: 2
sph.cuny.edu: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 44
- Total pull requests: 1
- Average time to close issues: 30 days
- Average time to close pull requests: 2 days
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 0.8
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 2 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- lcolladotor (32)
- joshstolz (11)
Pull Request Authors
- gpertea (2)
Top Labels
Issue Labels
enhancement (14)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- bioconductor 7,836 total
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
bioconductor.org: qsvaR
Generate Quality Surrogate Variable Analysis for Degradation Correction
- Homepage: https://github.com/LieberInstitute/qsvaR
- Documentation: https://bioconductor.org/packages/release/bioc/vignettes/qsvaR/inst/doc/qsvaR.pdf
- License: Artistic-2.0
-
Latest release: 1.12.0
published 10 months ago
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 31.2%
Downloads: 93.5%
Maintainers (1)
Last synced:
6 months ago
Dependencies
.github/workflows/check-bioc.yml
actions
- JamesIves/github-pages-deploy-action releases/v4 composite
- actions/cache v3 composite
- actions/checkout v3 composite
- actions/upload-artifact master composite
- docker/build-push-action v4 composite
- docker/login-action v2 composite
- docker/setup-buildx-action v2 composite
- docker/setup-qemu-action v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
DESCRIPTION
cran
- R >= 4.2 depends
- SummarizedExperiment * depends
- ggplot2 * imports
- methods * imports
- stats * imports
- sva * imports
- BiocFileCache * suggests
- BiocStyle * suggests
- RefManageR * suggests
- covr * suggests
- knitr * suggests
- limma * suggests
- rmarkdown * suggests
- sessioninfo * suggests
- testthat >= 3.0.0 suggests