ggfastman

Fast manhattenplots using ggplot2

https://github.com/roman-tremmel/ggfastman

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

fast ggplot2 gwas manhattan manhattan-plot plot plotting-pvalues pvalues qqplot r snp
Last synced: 4 months ago · JSON representation ·

Repository

Fast manhattenplots using ggplot2

Basic Info
  • Host: GitHub
  • Owner: roman-tremmel
  • License: gpl-3.0
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 2.54 MB
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 2
Topics
fast ggplot2 gwas manhattan manhattan-plot plot plotting-pvalues pvalues qqplot r snp
Created over 5 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

Introduction

This is a very fast and easy-to-individualize plotting function for GWAS results e.g. pvalues. Since I'm using ggplot2 a lot, I adopted the idea from a very nice project and combined it with a super fast plotting approach from the scattermore project.

A manhattan plot displays chromosomal positions against -mostly -log10 values- of genome-wide association studies between single nucleotide variants (SNV) or polymorphisms (SNP) and an endpoint e.g. expression, enzyme activity or case-control data.

One of the first R packages offering manhattan as well as qq plots was qqman from Stephen Turner, and nowadays there are a lot of different packages and approaches available for R and python. But a very fast one, which is still fast when plotting billions of data points, was missing.

This package ggfastman is trying to fill this gap.

Installation

So far the package is tested on Windows and MacOS, but is not on Cran, thus you need to:

devtools::install_github("roman-tremmel/ggfastman", upgrade = "never")

The package is depending on the additional packages ggplot2 and scattermore and some more. If there are problems try to install at least the latter one using:

devtools::install_github('exaexa/scattermore', dependencies = F, force = T, upgrade = "never")

Citation

You can cite the package using

DOI

Usage

The normal one

As an example you can load some data which is included in the package and run following code. More information of the data set is provided here.

{r} library(ggfastman) data("gwas_data") head(gwas_data)

Important is that the data has the three columns which are required:

  1. chr
  2. pos
  3. pvalue

The chr should be in the format c("chr1", "chr2", "chr3", "chrX"...), the pos column must be a numeric vector reflecting base pair positions and the pvalue column contains the pvalues.

We can plot the manhattan figure with the speed option "slow" using only ggplot2 functions as follows.

{r} fast_manhattan(gwas_data, build='hg18', speed = "slow")

The fast way

Depending on your system this takes a while, particularly when plotting pvalues of more than 1,000,000 SNVs. Therefore, we replace the geom_point() function with the scattermore::geom_scattermore() function by calling the manhattan function using the "fast" option.

```{r} fastmanhattan(gwasdata, build='hg18', speed = "fast")

or

fastmanhattan(gwasdata, build='hg18', speed = "f") `` Zooooom, that was fast, right? How does it work? For the explanation I want to refer to thescattermore` package. Only so much, the speed is reached with some C code, rasterization and some magic.

Of course you can increase the point size and the resolution by loosing some of the speed.

{r} fast_manhattan(gwas_data, build='hg18', speed = "fast", pointsize = 3, pixels = c(1000, 1000))

The insane way

The fastest option is speed = "ultrafast". The fastest way costs that the data is plotted only in pure black. But I think it is it worth. Benchmarks are analysed below

```{r}

some big data file with >10^6 rows

biggwasdata <- do.call(rbind, replicate(15, gwasdata, simplify = FALSE)) fastmanhattan(biggwasdata, build='hg18', speed = "ultrafast")

compare with

fastmanhattan(biggwas_data, build='hg18', speed = "fast")

not compare with, unless you want to wait some minutes

fastmanhattan(biggwas_data, build='hg18', speed = "slow")

```

Individualization

Of course you can individualize the plot using standard ggplot2 functions.

  • xy-scales

```{r} fastmanhattan(gwasdata, build='hg18', speed = "fast", y_scale = F) + ylim(2, 10)

Of note, set y_scale = F to avoid the error of a second y-scale.

distinct chromosomes on x-axis

fastmanhattan(gwasdata[gwas_data$chr %in% c("chr1", "chr10", "chr22"),], build='hg18', speed = "fast")

```

  • color

Add color globally or highlight only individual SNPs. Of note, this is working for shape in the "slow"-mode as well.

{R} gwas_data2 <- gwas_data gwas_data2$color <- as.character(factor(gwas_data$chr, labels = 1:22)) fast_manhattan(gwas_data2, build = "hg18", speed = "fast") man 1

Highlight only some SNPs

{r} gwas_data2$color <- NA gwas_data2[gwas_data2$pvalue < 1e-5, ]$color <- "red" fast_manhattan(gwas_data2, build = "hg18", speed = "fast")

man 2

  • add significance line(s) and snp annotation(s)

{r} library(tidyverse) library(ggrepel) fast_manhattan(gwas_data, build='hg18', speed = "fast", color1 = "pink", color2 = "turquoise", pointsize = 3, pixels = c(1000, 500)) + geom_hline(yintercept = -log10(5e-08), linetype =2, color ="darkgrey") + # genomewide significance line geom_hline(yintercept = -log10(1e-5), linetype =2, color ="grey") + # suggestive significance line ggrepel::geom_text_repel(data = . %>% group_by(chr) %>% # ggrepel to avoid overplotting top_n(1, -pvalue) %>% # extract highest y values slice(1) %>% # if there are ties, choose the first one filter(pvalue <= 5e-08), # filter for significant ones aes(label=rsid), color =1) # add top rsid

Resulting manhattan plot

  • Facetting

{r} library(tidyverse) gwas_data %>% mutate( gr= "Study 1") %>% # rbind a second study bind_rows(., mutate(., gr= "Study 2", pvalue = runif(n()))) %>% fast_manhattan(., build = "hg18", speed = "fast", pointsize = 2.1, pixels = c(1000,500)) + geom_hline(yintercept = -log10(5e-08), linetype =2, color ="deeppink") + geom_hline(yintercept = -log10(1e-5), linetype =2, color ="grey") + facet_wrap(~gr, nrow = 2, scales = "free_y") + theme_bw(base_size = 16) + theme(panel.grid.minor.y = element_blank(), panel.grid.minor.x = element_blank())

Resulting manhattan plot

{r} fast_manhattan(gwas_data, build = "hg18", speed = "fast",pointsize = 3.2, pixels = c(1000,500)) + geom_hline(yintercept = -log10(5e-08), linetype =2, color ="deeppink") + geom_hline(yintercept = -log10(1e-5), linetype =2, color ="grey") + ggforce::facet_zoom(x = chr == "chr9",zoom.size = 1)

Resulting manhattan plot

  • and locus plots with linkage data. Of note, you have to register here and get your token

{r} fast_locusplot(gwas_data, token = "replace with your token", show_MAF = T, show_regulom = T)

locus plot

In addition the package includes also a fast way to create QQ-plots

{r} fast_qq(pvalue = runif(10^6), speed = "fast")

Resulting manhattan plot

Benchmarks

The benchmark analysis includes all operations of a plot generation including the code evaluation, the plotting as well as the saving of a .png file using png() for base R plots and ggsave() for ggplot figures. For a better comparison the same parameters for both approaches were chosen e.g. width = 270, height = 100 & units = "mm" as well as res=300 and dpi = 300, respectively. We compared the three speed option included in this package with fastman::fastman() and qqman::manhattan functions using bench::mark() with a minimum of 10 iterations. The complete code can be found here: benchmark_plot

The first comparision was performed using the example GWAS data of app. 80k pvalues/rows. As illustrated below, all three speed options were significantly faster than the other two base R functions, although the "slow" option performed rather similar compared to the base R functions regarding the user experience.

{r} gwas_data$chrom <- as.numeric(gsub("chr", "", gwas_data$chr)) res_small_manhattan <- bench_plot(gwas_data) plot_bench(res_small_manhattan)

speed1

In the next step we created manhattan plots on really big data of more than nine million datapoints by replicating the example data 120-times on a CPU i7-9700, 3GHz with 32GB RAM system.

{r} big_gwas_data <- do.call(rbind, replicate(120, gwas_data, simplify = FALSE)) nrow(big_gwas_data) 9495360 res_big_manhattan <- bench_plot(big_gwas_data)

There were again significant differences between the three analysed methods. Interestingly the fastman function performed very well. This fast behavior with this function is achieved with data cropping in the non-significant pvalue areas e.g. using only 20k pvalues>0.1, 0.01 > pvalues < 0.1, ... Nevertheless, the expierienced performance using the RStudio plotting window is even slower compared to the "fast" version. But if you are sticked to base R, the fastman package seems to be the choice for a fast plotting of >9x10^6 pvalues.

speed2

Questions and Bugs

Please report bugs by open github issue(s) here.

Owner

  • Name: Roman
  • Login: roman-tremmel
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this R package, please cite it as below."
authors:
- family-names: "Tremmel"
  given-names: "Roman"
  orcid: "https://orcid.org/0000-0003-1564-0433"
title: "ggfastman"
version: 1.2.0
doi: 10.5281/zenodo.1234
date-released: 2021-06-18
url: "https://github.com/roman-tremmel/ggfastman"

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • GenomicRanges * imports
  • Homo.sapiens * imports
  • LDlinkR * imports
  • ggbio * imports
  • ggplot2 >= 2.2.1 imports
  • ggrepel * imports
  • scattermore * imports