Varistran

Varistran: Anscombe's variance stabilizing transformation for RNA-seq gene expression data - Published in JOSS (2017)

https://github.com/monashbioinformaticsplatform/varistran

Science Score: 96.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 13 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
    Organization monashbioinformaticsplatform has institutional domain (platforms.monash.edu)
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Biochemistry, Genetics and Molecular Biology Life Sciences - 40% confidence
Last synced: 6 months ago · JSON representation

Repository

R package providing Variance Stabilizing Transformations appropriate for RNA-Seq data

Basic Info
  • Host: GitHub
  • Owner: MonashBioinformaticsPlatform
  • License: lgpl-2.1
  • Language: R
  • Default Branch: master
  • Size: 1.05 MB
Statistics
  • Stars: 21
  • Watchers: 15
  • Forks: 3
  • Open Issues: 0
  • Releases: 4
Created over 10 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog License Codemeta

README.md

Varistran

Varistran is an R package providing a Variance Stabilizing Transformation appropriate for RNA-Seq data, and a variety of diagnostic plots based on such transformation. As of 2025, it also contains a new method of transformation and library size adjustment that I am calling "samesum", which aims to be a better cousin of the Center Log Ratio transformation.

Varistran is developed by Paul Harrison (paul.harrison@monash.edu, @paulfharrison.bsky.social) for the Monash Genomics and Bioinformatics platform.

Install

Varistran is most easily installed from GitHub using BiocManager. This will install needed dependencies, including the edgeR package from BioConductor as well as various CRAN packages.

``` install.packages("BiocManager")

BiocManager::install("MonashBioinformaticsPlatform/varistran") ```

Refer to the DESCRIPTION file for the full list of dependencies.

Example usage

I intend to keep adding features to Varistran. Therefore, I recommend using its functions via varistran:: rather than attaching its namespace with library(varistran).

The examples below use the Bottomly dataset available from ReCount.

``` library(Biobase)

download.file("http://bowtie-bio.sourceforge.net/recount/ExpressionSets/bottomlyeset.RData", "bottomlyeset.RData")

load("bottomly_eset.RData")

counts <- exprs(bottomly.eset)

strain <- phenoData(bottomly.eset)$strain experiment.number <- factor( phenoData(bottomly.eset)$experiment.number ) design <- model.matrix(~ strain + experiment.number) ```

Variance stabilizing transformation

Say you have a count matrix counts and a design matrix design. To perform a variance stabilizing transformation:

y <- varistran::vst(counts, design=design)

By default, Anscombe's (1948) variance stabilizing transformation for the negative binomial distribution is used. This behaves like log2 for large counts (log2 Counts-Per-Million if cpm=TRUE is given).

An appropraite dispersion is estimated with the aid of the design matrix. If omitted, this defaults to a column of ones, for blind estimation of the dispersion. This might slightly over-estimate the dispersion. A third possibility is to estimate the dispersion with edgeR.

Samesum transformation

As of 2025, I am also experimenting with a new transformation method I call "samesum".

Diagnostic plots

plot_stability allows assessment of how well the variance has been stabilized. Ideally this will produce a horizontal line, but counts below 5 will always show a drop off in variance.

varistran::plot_stability(y, counts, design=design)

plot_biplot provides a two-dimensional overview of your samples and genes using Principle Components Analysis (similar concept to plotMDS in limma):

varistran::plot_biplot(y)

plot_heatmap draws a heatmap.

varistran::plot_heatmap(y, n=50)

Shiny report

Varistran's various diagnostic plots are also available as a Shiny app, which can be launched with:

varistran::shiny_report(y, counts)

If y is not given it is calculated with varistran::vst(counts).

varistran::shiny_report(counts=counts)

This also includes the limma bioconductor package's plotMDS MDS plot (Ritchie et al, 2015).

Test suite

Download the source code, and ensure that the Bioconductor packages in the "Suggests:" field of the DESCRIPTION file are installed. Then a suite of tests can be run with:

make test

Outputs are placed in a directory called test_output.

Sources of data used in these tests are:

  • The Bottomly dataset from ReCount (Frazee, Langmead and Leek, 2011).

  • The "arab" dataset provided in the NBPSeq package (Di et al, 2011).

  • Simulated data following negative binomial distributions.

Dispersion estimates are compared to those calculated by the edgeR biocnoductor package's estimateGLMCommonDisp function (McCarthy, Chen and Smyth, 2012) and by the DESeq2 bioconductor package's DESeq function (Love, Huber and Anders, 2014).

Supporting/contributing

Please email questions about using this software to the author, paul.harrison@monash.edu.

Please file bug reports and feature requests by filing a bug report, or by contacting the author.

Pull requests gratefully considered.

Links

Citing Varistran

To cite this R package, use:

Harrison, Paul F. 2017. "Varistran: Anscombe's variance stabilizing transformation for RNA-seq gene expression data." The Journal of Open Source Software 2 (16). doi:10.21105/joss.00257

References

Anscombe, Francis J. 1948. "The Transformation of Poisson, Binomial and Negative-Binomial Data." Biometrika 35 (3/4): 246–54.

Di, Yamming, Daniel W. Schafer, Jason S. Cumbie and Jeff H. Chang. 2011. "The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq" Statistical Applications in Genetics and Molecular Biology 10 (1). doi:10.2202/1544-6115.1637

Frazee, Alyssa C., Ben Langmead and Jeffrey T. Leek. 2011. "ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets." BMC Bioinformatics 12: 449. doi:10.1186/1471-2105-12-449

Love, Michael I., Wolfgang Huber and Simon Anders. 2014. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome Biology 15 (12): 550. doi:10.1186/s13059-014-0550-8

McCarthy, Davis J., Yunshun Chen and Gordon K. Smyth. 2012. "Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation." Nucleic Acids Research 40 (10): 4288-4297. doi:10.1093/nar/gks042

Ritchie, Matthew E., Belinda Phipson, Di Wu, Yifang Hu, Charity W. Law, Wei Shi and Gordon K. Smyth. 2015. "limma powers differential expression analyses for RNA-sequencing and microarray studies." Nucleic Acids Research 43 (7): e47. doi:10.1093/nar/gkv007

Robinson, Mark D. and Alicia Oshlack. 2010. "A scaling normalization method for differential expression analysis of RNA-seq data." Genome Biology 11 (3): R25. doi:10.1186/gb-2010-11-3-r25

Owner

  • Name: Monash Bioinformatics Platform
  • Login: MonashBioinformaticsPlatform
  • Kind: organization
  • Email: bioinformatics.platform@monash.edu
  • Location: Monash University, Australia

JOSS Publication

Varistran: Anscombe's variance stabilizing transformation for RNA-seq gene expression data
Published
August 27, 2017
Volume 2, Issue 16, Page 257
Authors
Paul Francis Harrison ORCID
Monash Bioinformatics Platform, Monash University
Editor
Roman Valls Guimera ORCID
Tags
RNA-seq gene expression variance stabilizing transformation R package

CodeMeta (codemeta.json)

{
  "@context": "https://raw.githubusercontent.com/mbjones/codemeta/master/codemeta.jsonld",
  "@type": "Code",
  "author": [
    {
      "@id": "0000-0002-3980-268X",
      "@type": "Person",
      "email": "paul.harrison@monash.edu",
      "name": "Paul Harrison",
      "affiliation": "Monash Bioinformatics Platform, Monash University"
    }
  ],
  "identifier": "",
  "codeRepository": "https://github.com/MonashBioinformaticsPlatform/varistran",
  "datePublished": "2017-03-13",
  "dateModified": "2017-03-13",
  "dateCreated": "2017-03-13",
  "description": "Anscombe's variance stabilizing transformation for RNA-seq gene expression data.",
  "keywords": "RNA-seq, gene expression, variance stabilizing transformation, R package",
  "license": "LGPL v2.1",
  "title": "Varistran",
  "version": "v1.0.1"
}

GitHub Events

Total
  • Create event: 2
  • Issues event: 1
  • Release event: 1
  • Watch event: 2
  • Push event: 17
Last Year
  • Create event: 2
  • Issues event: 1
  • Release event: 1
  • Watch event: 2
  • Push event: 17

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 116
  • Total Committers: 2
  • Avg Commits per committer: 58.0
  • Development Distribution Score (DDS): 0.009
Past Year
  • Commits: 17
  • Committers: 1
  • Avg Commits per committer: 17.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Paul Harrison p****h@l****t 115
roryk r****r@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 1
  • Average time to close issues: over 1 year
  • Average time to close pull requests: 15 minutes
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 3.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • smazrouee (1)
  • roryk (1)
Pull Request Authors
  • roryk (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • grid * depends
  • MASS * imports
  • ggplot2 * imports
  • gridBase * imports
  • miniUI * imports
  • seriation * imports
  • shiny * imports
  • DESeq2 * suggests
  • NBPSeq * suggests
  • biomaRt * suggests
  • edgeR * suggests
  • limma * suggests