pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks

https://github.com/egeulgen/pathfindr

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.9%) to scientific vocabulary

Keywords

active-subnetworks enrichment pathway pathway-enrichment-analysis r subnetwork
Last synced: 6 months ago · JSON representation

Repository

pathfindR: Enrichment Analysis Utilizing Active Subnetworks

Basic Info
Statistics
  • Stars: 192
  • Watchers: 5
  • Forks: 27
  • Open Issues: 0
  • Releases: 30
Topics
active-subnetworks enrichment pathway pathway-enrichment-analysis r subnetwork
Created about 8 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.Rmd

---
output: github_document
---



```{r, include=FALSE}
knitr::opts_chunk$set(collapse=TRUE,
                      comment="#>",
                      fig.path="inst/extdata/",
                      out.width="100%")
suppressPackageStartupMessages(library(pathfindR))
```

#  pathfindR: An R Package for Enrichment Analysis Utilizing Active Subnetworks


[![R-CMD-check](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml)
[![codecov](https://codecov.io/gh/egeulgen/pathfindR/graph/badge.svg?token=8m9aPaXzNr)](https://codecov.io/gh/egeulgen/pathfindR)
[![CRAN version](https://www.r-pkg.org/badges/version/pathfindR)](https://cran.r-project.org/package=pathfindR)
[![CRAN total downloads](https://cranlogs.r-pkg.org/badges/grand-total/pathfindR)](https://cran.r-project.org/package=pathfindR)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/r-pathfindr/README.html)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit)



# Overview

`pathfindR` is an R package for enrichment analysis via active subnetworks. The package also offers functionality to cluster the enriched terms and identify representative terms in each cluster, score the enriched terms per sample, and visualize analysis results. As of the latest version, the package also allows comparison of two pathfindR results.

The functionality suite of pathfindR is described in detail in _Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. [https://doi.org/10.3389/fgene.2019.00858](https://doi.org/10.3389/fgene.2019.00858)_

For detailed documentation, see [pathfindR's website](https://egeulgen.github.io/pathfindR/).

# Installation

- You can install the released version of pathfindR from CRAN via:

```{r installation1, eval=FALSE}
install.packages("pathfindR")
```

- Since version 2.1.0, you may also install `pathfindR` via conda:

```{bash conda, eval=FALSE}
conda install -c bioconda r-pathfindr
```

- Via [pak](https://pak.r-lib.org/) (this might be preferable given `pathfindR`'s Bioconductor dependencies):

```{r installation2, eval=FALSE}
install.packages("pak") # if you have not installed "pak"
pak::pkg_install("pathfindR")
```

- And the development version from GitHub via `devtools`:

```{r installation3, eval=FALSE}
install.packages("devtools") # if you have not installed "devtools"
devtools::install_github("egeulgen/pathfindR")
```

> **IMPORTANT NOTE**
> For the active subnetwork search component to work, the user must have [Java (>= 8.0)](https://www.java.com/en/download/) installed, and the path/to/java must be in the PATH environment variable.

We also have docker images available on [Docker Hub](https://hub.docker.com/repository/docker/egeulgen/pathfindr) and [GitHub](https://github.com/egeulgen/pathfindR/packages):

```{bash docker, eval=FALSE}
# pull image for the latest release
docker pull egeulgen/pathfindr:latest

# pull image for a specific version (e.g., 1.4.1)
docker pull egeulgen/pathfindr:1.4.1
```

Online app on superbio.ai: [https://app.superbio.ai/apps/111/](https://app.superbio.ai/apps/111/)

# Enrichment Analysis with pathfindR

![pathfindR Enrichment Workflow](https://github.com/egeulgen/pathfindR/blob/master/vignettes/pathfindr.png?raw=true "pathfindr Enrichment Workflow")

This workflow takes in a data frame consisting of "gene symbols", "change values" (optional), and "associated p-values":

```{r example_input, echo=FALSE}
tmp <- example_pathfindR_input[1:4, ]
tmp$logFC <- round(tmp$logFC,2)
tmp$adj.P.Val <- format(tmp$adj.P.Val, digits=2)
colnames(tmp) <- c("Gene_symbol", "logFC", "FDR_p")
knitr::kable(tmp, align=c("l", "c", "c"))
```

After input testing, any gene symbol that is not in the chosen protein-protein interaction network (PIN) is converted to an alias symbol if there is an alias that is found in the PIN. After mapping the input genes with the associated p-values onto the PIN, active subnetwork search is performed. The resulting active subnetworks are then filtered based on their scores and the number of significant genes they contain. 

> An active subnetwork can be defined as a group of interconnected genes in a protein-protein interaction network (PIN) that predominantly consists of significantly altered genes. In other words, active subnetworks define distinct disease-associated sets of interacting genes, whether discovered through the original analysis or discovered because of being in interaction with a significant gene.

These filtered lists of active subnetworks are then used for enrichment analyses, i.e., using the genes in each of the active subnetworks, the significantly enriched terms (pathways/gene sets) are identified. Enriched terms with adjusted p-values larger than the given threshold are discarded, and the lowest adjusted p-value (among all active subnetworks) for each term is kept. This process of `active subnetwork search + enrichment analyses` is repeated for a selected number of iterations, performed in parallel. Over all iterations, the lowest and the highest adjusted p-values, and the number of occurrences among all iterations are reported for each significantly enriched term.

This workflow can be run using the function `run_pathfindR()`:

```{r basic_usage, eval=FALSE}
library(pathfindR)
output_df <- run_pathfindR(input_df)
```

This wrapper function performs the active-subnetwork-oriented enrichment analysis, and returns a data frame of enriched terms:

![pathfindR Enrichment Chart](https://github.com/egeulgen/pathfindR/blob/master/vignettes/enrichment_chart.png?raw=true "Enrichment Chart")

Some useful arguments are:

```{r useful_args, eval=FALSE}
# set an output directory for saving active subnetworks and creating an HTML report 
# (default=NULL, sets a temporary directory)
output_df <- run_pathfindR(input_df, output_dir="/top/secret/results")

# change the gene sets used for analysis (default="KEGG")
output_df <- run_pathfindR(input_df, gene_sets="GO-MF")

# change the PIN for active subnetwork search (default=Biogrid)
output_df <- run_pathfindR(input_df, pin_name_path="IntAct")
# or use an external PIN of your choice
output_df <- run_pathfindR(input_df, pin_name_path="/path/to/my/PIN.sif")

# change the number of iterations (default=10)
output_df <- run_pathfindR(input_df, iterations=25) 

# report the non-significant active subnetwork genes (for later analyses)
output_df <- run_pathfindR(input_df, list_active_snw_genes=TRUE)
```

The available PINs are "Biogrid", "STRING", "GeneMania", "IntAct", "KEGG" and "mmu_STRING". The available gene sets are "KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC", "GO-MF", and "mmu_KEGG". You also use a custom PIN (see `?return_pin_path`) or a custom gene set (see `?fetch_gene_set`)

> As of the latest development version, pathfindR offers utility functions for obtaining organism-specific PIN data (for now, only BioGRID PINs) and organism-specific gene sets (KEGG and Reactome) data via `get_pin_file()` and `get_gene_sets_list()`, respectively.

# Clustering of the Enriched Terms

![Enriched Terms Clustering Workflow](https://github.com/egeulgen/pathfindR/blob/master/vignettes/term_clustering.png?raw=true "Enriched Terms Clustering Workflow")
The wrapper function for this workflow is `cluster_enriched_terms()`.

This workflow first calculates the pairwise kappa statistics between the enriched terms. The function then performs hierarchical clustering (by default), automatically determines the optimal number of clusters by maximizing the average silhouette width and returns a data frame with cluster assignments.

```{r clustering_h, eval=FALSE}
# default settings
clustered_df <- cluster_enriched_terms(output_df)

# display the heatmap of hierarchical clustering
clustered_df <- cluster_enriched_terms(output_df, plot_hmap=TRUE)

# display the dendrogram and automatically-determined clusters
clustered_df <- cluster_enriched_terms(output_df, plot_dend=TRUE)

# change agglomeration method (default="average") for hierarchical clustering
clustered_df <- cluster_enriched_terms(output_df, clu_method="centroid")
```

Alternatively, the `fuzzy` clustering method (as described in Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007;8(9):R183.) can be used:

```{r clustering_f, eval=FALSE}
clustered_df_fuzzy <- cluster_enriched_terms(output_df, method="fuzzy")
```

# Visualization of Enrichment Results

## Enriched Term Diagrams

For H.sapiens KEGG enrichment analyses, `visualize_terms()` can be used to generate KEGG pathway diagrams as `ggraph` (inherits from `ggplot`) objects (using [`ggkegg`](https://github.com/noriakis/ggkegg)):

```{r KEGG_vis, eval=FALSE}
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = TRUE
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "hsa04911_diagram.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,         # what to plot
  width = 5                 # adjust width
  height = 5                # adjust height
) 
```

![KEGG Pathway Diagram](https://github.com/egeulgen/pathfindR/blob/master/vignettes/example_kegg_pathway_diagram.png?raw=true)

Alternatively (i.e., for other types of (non-KEGG) enrichment analyses), an interaction diagram per enriched term can be generated again via `visualize_terms()`. These diagrams are also returned as `ggraph` objects:

```{r nonKEGG_viss, eval=FALSE}
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = FALSE,
  pin_name_path = "Biogrid"
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "diabetic_cardiomyopathy_interactions.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,                             # what to plot
  width = 10                                    # adjust width
  height = 6                                    # adjust height
) 
```

![Interaction Diagram](https://github.com/egeulgen/pathfindR/blob/master/vignettes/example_interaction_vis.png?raw=true)

## Term-Gene Heatmap

The function `term_gene_heatmap()` can visualize the heatmap of enriched terms by the involved input genes. This heatmap allows visual identification of the input genes involved in the enriched terms, and the common or distinct genes between different terms. If the input data frame (same as in `run_pathfindR()`) is supplied, the tile colors indicate the change values.

![Term-Gene Heatmap](https://github.com/egeulgen/pathfindR/blob/master/vignettes/hmap.png?raw=true "Term-Gene Heatmap")

## Term-Gene Graph

The function `term_gene_graph()` (adapted from the Gene-Concept network visualization by the R package `enrichplot`) can be utilized to visualize which significant genes are involved in the enriched terms. The function creates the term-gene graph, displaying the connections between genes and biological terms (enriched pathways or gene sets). This allows for the investigation of multiple terms to which significant genes are related. The graph also enables the determination of the degree of overlap between the enriched terms by identifying shared and/or distinct significant genes.

![Term-Gene Graph](https://github.com/egeulgen/pathfindR/blob/master/vignettes/term_gene.png?raw=true "Term-Gene Graph")

## UpSet Plot

UpSet plots are plots of the intersections of sets as a matrix. This function creates a ggplot object of an UpSet plot where the x-axis is the UpSet plot of intersections of enriched terms. By default (i.e., `method="heatmap"`), the main plot is a heatmap of genes at the corresponding intersections, colored by up-/down-regulation (if `genes_df` is provided, colored by change values). If `method="barplot"`, the main plot is bar plots of the number of genes at the corresponding intersections. Finally, if `method="boxplot"` and `genes_df` is provided, then the main plot displays the boxplots of the genes' change values at the corresponding intersections.

![UpSet plot](https://github.com/egeulgen/pathfindR/blob/master/vignettes/upset.png?raw=true "UpSet Plot")

# Per Sample Enriched Term Scores

![Agglomerated Scores for all Enriched Terms per Sample](https://github.com/egeulgen/pathfindR/blob/master/vignettes/score_hmap.png?raw=true "Scoring per Sample")

The function `score_terms()` can be used to calculate the agglomerated z score of each enriched term per sample. This allows the user to examine the scores individually and infer how a term is overall altered (activated or repressed) in a given sample or a group of samples.

# Comparison of 2 pathfindR Results

The function `combine_pathfindR_results()` allows combining two pathfindR analysis results for investigating common and distinct terms between the groups. Below is an example for comparing two different results using rheumatoid arthritis-related data.

```{r compare2res, eval=FALSE}
combined_df <- combine_pathfindR_results(
  result_A=an_output_df, 
  result_B=another_output_df
)
```

By default, `combine_pathfindR_results()` plots the term-gene graph for the common terms in the combined results. The function `combined_results_graph()` can be used to create this graph (using only selected terms etc.) later on.

```{r compare_graph, eval=FALSE}
combined_results_graph(combined_df, selected_terms=c("hsa04144", "hsa04141", "hsa04140"))
```

![Combined Results Graph](https://github.com/egeulgen/pathfindR/blob/master/vignettes/combined_graph.png?raw=true "Combined Results Graph")

Owner

  • Name: Ege Ulgen
  • Login: egeulgen
  • Kind: user
  • Location: London
  • Company: @genomicsengland

MD, PhD - Bioinformatics Engineer

GitHub Events

Total
  • Create event: 6
  • Release event: 2
  • Issues event: 15
  • Watch event: 12
  • Delete event: 6
  • Issue comment event: 25
  • Push event: 44
  • Pull request review comment event: 1
  • Pull request review event: 1
  • Pull request event: 10
  • Fork event: 2
Last Year
  • Create event: 6
  • Release event: 2
  • Issues event: 15
  • Watch event: 12
  • Delete event: 6
  • Issue comment event: 25
  • Push event: 44
  • Pull request review comment event: 1
  • Pull request review event: 1
  • Pull request event: 10
  • Fork event: 2

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 991
  • Total Committers: 5
  • Avg Commits per committer: 198.2
  • Development Distribution Score (DDS): 0.016
Past Year
  • Commits: 122
  • Committers: 2
  • Avg Commits per committer: 61.0
  • Development Distribution Score (DDS): 0.016
Top Committers
Name Email Commits
Ege Ulgen e****n@g****m 975
Ozan Özışık o****u@g****m 8
Richard Meitern r****n@g****m 3
Roy Lardenoije 3****e 3
egeulgen e****n@g****k 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 134
  • Total pull requests: 32
  • Average time to close issues: 16 days
  • Average time to close pull requests: 4 days
  • Total issue authors: 99
  • Total pull request authors: 5
  • Average comments per issue: 3.02
  • Average comments per pull request: 0.16
  • Merged pull requests: 26
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 9
  • Pull requests: 12
  • Average time to close issues: 1 day
  • Average time to close pull requests: 10 days
  • Issue authors: 7
  • Pull request authors: 2
  • Average comments per issue: 1.11
  • Average comments per pull request: 0.25
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • egeulgen (16)
  • choijamtsmunkhzul (5)
  • Rohit-Satyam (4)
  • apelin20 (2)
  • boseb (2)
  • ManarRashad (2)
  • cristanchoa (2)
  • zinagood (2)
  • powerhorse1986 (2)
  • safrikDut (2)
  • teunbrand (2)
  • dalhoomist (2)
  • AliSaadatV (2)
  • evofish (2)
  • richardcoca (2)
Pull Request Authors
  • egeulgen (38)
  • mustafapir (2)
  • lardenoije (2)
  • ozanozisik (1)
  • rix133 (1)
Top Labels
Issue Labels
bug (12) dependency issue (2) enhancement (1) wontfix (1) question (1)
Pull Request Labels
bug (9) enhancement (6) dependency issue (5)

Packages

  • Total packages: 3
  • Total downloads:
    • cran 921 last-month
  • Total docker downloads: 22,201
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 3
    (may contain duplicates)
  • Total versions: 72
  • Total maintainers: 1
proxy.golang.org: github.com/egeulgen/pathfindR
  • Versions: 22
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.5%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
proxy.golang.org: github.com/egeulgen/pathfindr
  • Versions: 22
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.5%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
cran.r-project.org: pathfindR

Enrichment Analysis Utilizing Active Subnetworks

  • Versions: 28
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 921 Last month
  • Docker Downloads: 22,201
Rankings
Docker downloads count: 0.6%
Stargazers count: 2.7%
Forks count: 3.2%
Average: 10.8%
Downloads: 13.0%
Dependent repos count: 16.4%
Dependent packages count: 28.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.0 depends
  • pathfindR.data * depends
  • AnnotationDbi * imports
  • DBI * imports
  • KEGGREST * imports
  • KEGGgraph * imports
  • R.utils * imports
  • doParallel * imports
  • foreach * imports
  • fpc * imports
  • ggplot2 * imports
  • ggraph * imports
  • ggupset * imports
  • grDevices * imports
  • igraph * imports
  • knitr * imports
  • magick * imports
  • msigdbr * imports
  • org.Hs.eg.db * imports
  • rmarkdown * imports
  • covr * suggests
  • testthat >= 2.3.2 suggests