dea_seurat

A Snakemake workflow and MrBiomics module for performing differential expression analyses (DEA) on (multimodal) sc/snRNA-seq data powered by the R package Seurat.

https://github.com/epigen/dea_seurat

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 18 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

bioinformatics biomedical-data-science differential-expression-analysis scrna-seq single-cell snakemake snrna-seq visualization volcano-plot workflow
Last synced: 4 months ago · JSON representation ·

Repository

A Snakemake workflow and MrBiomics module for performing differential expression analyses (DEA) on (multimodal) sc/snRNA-seq data powered by the R package Seurat.

Basic Info
Statistics
  • Stars: 22
  • Watchers: 4
  • Forks: 11
  • Open Issues: 3
  • Releases: 5
Topics
bioinformatics biomedical-data-science differential-expression-analysis scrna-seq single-cell snakemake snrna-seq visualization volcano-plot workflow
Created over 3 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

MrBiomics DOI GitHub license GitHub Release Snakemake

Single-cell RNA sequencing (scRNA-seq) Differential Expression Analysis & Visualization Workflow

A Snakemake 8 workflow for performing differential expression analyses (DEA) of processed (multimodal) scRNA-seq data powered by the R package Seurat's functions FindMarkers and FindAllMarkers.

[!NOTE]
This workflow adheres to the module specifications of MrBiomics, an effort to augment research by modularizing (biomedical) data science. For more details, instructions, and modules check out the project's repository.

⭐️ Star and share modules you find valuable 📤 - help others discover them, and guide our future work!

[!IMPORTANT]
If you use this workflow in a publication, please don't forget to give credit to the authors by citing it using this DOI 10.5281/zenodo.10689139.

Workflow Rulegraph

🖋️ Authors

💿 Software

This project wouldn't be possible without the following software and their dependencies:

| Software | Reference (DOI) | | :------------: | :-----------------------------------------------: | | data.table | https://r-datatable.com | | EnhancedVolcano| https://doi.org/10.18129/B9.bioc.EnhancedVolcano | | future | https://doi.org/10.32614/RJ-2021-048 | | ggplot2 | https://ggplot2.tidyverse.org/ | | pheatmap | https://cran.r-project.org/package=pheatmap | | Seurat | https://doi.org/10.1016/j.cell.2021.04.048 | | Snakemake | https://doi.org/10.12688/f1000research.29032.2 |

🔬 Methods

This is a template for the Methods section of a scientific publication and is intended to serve as a starting point. Only retain paragraphs relevant to your analysis. References [ref] to the respective publications are curated in the software table above. Versions (ver) have to be read out from the respective conda environment specifications (workflow/envs/*.yaml file) or post-execution in the result directory (dea_seurat/envs/*.yaml). Parameters that have to be adapted depending on the data or workflow configurations are denoted in squared brackets e.g., [X].

The outlined analyses were performed using the R package Seurat (ver) [ref] unless stated otherwise.

Differential Expression Analysis (DEA). DEA was performed on the assay [X] and data slot [X] with Seurat's [FindMarkers|FindAllMarkers] function using the statistical test [X] with the parameters log2(fold change) threshold of [X] and minimal percentage of expression [X]. The results were filtered for relevant features by adjusted p-value of [X], absolute log2(fold change) of [X] and minimum percentage of expression [X].

Visualization. All filtered result statistics, i.e., number of statistically significant results split by positive (up) and negative (down) effect-sizes, were separately visualized with stacked bar plots using ggplot (ver) [ref]. To visually summarize results of the same analysis the filtered log2(fold change) values of features that were found to be at least in one comparison statistically significantly differentially expressed were visualized in a hierarchically clustered heatmap using pheatmap (ver) [ref]. Volcano plots were generated for each analysis using EnhancedVolcano (ver) [ref] with adjusted p-value threshold of [X] and log2(fold change) threshold of [X] as visual cut-offs for the y- and x-axis, respectively.

The analysis and visualizations described here were performed using a publicly available Snakemake ver workflow [10.5281/zenodo.10689139].

🚀 Features

The workflow performs the following steps to produce the outlined results (dea_seurat/{analysis}/). - Differential Expression Analysis (DEA) - using Seurat's FindMarkers or FindAllMarkers depending on the configuration (results.csv). This step is parallelized using the R package future. - feature list per comparison group and direction (up/down) for downstream analysis (e.g., enrichment analysis) (feature_lists/{group}_{up/down}_features.txt) - (optional) feature score tables (with two columns: "feature" and "score") per comparison group using {score_formula} for downstream analyses (e.g., preranked enrichment analysis) (feature_lists/{group}_featureScores.csv). - results filtered according to the configured thresholds: - statistical significance (adjusted p-value). - effect-size (log2 fold change). - expression (minimum percentage of expression) in one of the comparison groups. - filtered result statistics: number of statistically significant results split by positive (up) and negative (down) change (stats.csv). - Visualizations (dea_seurat/{analysis}/plots) - filtered result statistics: number of features and direction as bar plot (stats.png). - volanco plots per comparison group with effect size on the x-axis and raw p-value(*_rawp)/adjusted p-value (_adjp) on the y-axis (volcano/{feature_list}/{group}.png). - highlighting features according to configured cut-offs for statistical significance (pCutoff) and effect size (FCcutoff). - (optional) highlighting features according to configured feature lists. - hierarchically clustered heatmap of effect sizes (LFC) per comparison (features x comparisons) indicating statistical significance with a star '*' (heatmap/{feature_list}.png). - using all filtered features (FILTERED) - (optional) using configured feature lists - in case of more than 100 features the row labels and significance indicators (*) are removed

🛠️ Usage

Here are some tips for the usage of this workflow: - Perform your first run with loose filtering options/cut-offs and set the same for filtering and plotting to see if further filtering is even necessar or useful. - Try one small/simple analysis first before running all desired analyses.

⚙️ Configuration

Detailed specifications can be found here ./config/README.md

📖 Examples

Explore detailed examples showcasing module usage in our comprehensive end-to-end MrBiomics Recipes, including data, configuration, annotation and results: - scRNA-seq Analysis Recipe - scCRISPR-seq Analysis Recipe

Furthermore, we selected a scRNA-seq data set consisting of 15 CRC samples from Lee et al (2020) Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nature Genetics. Downloaded from the Weizmann Institute - Curated Cancer Cell Atlas (3CA) - Colorectal Cancer section. - samples/patients: 15 - cells: 21657 - features (genes): 22276 - preprocessed using the compatible MrBiomics module for scRNA-seq data processing & visualization - We performed a 1 vs rest analysis using the cell type annotation ("ALL"). - total runtime on HPC w/ SLURM (32GB RAM; only DEA with 8 cores otherwise 1 core): <25 minutes for 17 jobs in total

A comparison of the cell type marker expression split by cell types visualized as a dot plot with the DEA results as hierarchically clustered heatmap of the effect sizes.

data source/authors | Workflow Output :-------------------------:|:-------------------------: Cell Type Marker Dot plot | Cell Type Marker Dot plot

We provide metadata, annotation and configuration files for this data set in ./test. The processed and prepared Seurat RDS object has to be downloaded from Zenodo by following the instructions below. ```console # download Zenodo records using zenodo_get

# install zenodoget v1.3.4 conda install -c conda-forge zenodoget=1.3.4

# download the prepare Seurat RDS object zenodo_get --record 10688824 --output-dir=test/data/Lee2020NatGenet/ ```

🔗 Links

📚 Resources

  • Recommended compatible MrBiomics modules for:
    • upstream analysis
    • scRNA-seq Data Processing & Visualization for processing (multimodal) single-cell transcriptome data.
    • downstream analyses
    • Unsupervised Analysis to understand and visualize similarities and variations between cells/samples, including dimensionality reduction and cluster analysis. Useful for all tabular data including single-cell and bulk sequencing data.
    • Enrichment Analysis for biomedical interpretation of (differential) analysis results using prior knowledge.

📑 Publications

The following publications successfully used this module for their analyses. - FirstAuthors et al. (202X) Journal Name - Paper Title. - ...

⭐ Star History

Star History Chart

Owner

  • Name: Computational Epigenetics
  • Login: epigen
  • Kind: organization
  • Location: Vienna, Austria

Computational Epigenetics Research and Software

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Single-cell RNA sequencing (scRNA-seq) Differential
  Expression Analysis & Visualization Workflow
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Stephan
    family-names: Reichl
    orcid: 'https://orcid.org/0000-0001-8555-7198'
    affiliation: CeMM Research Center for Molecular Medicine
  - given-names: Christoph
    family-names: Bock
    orcid: 'https://orcid.org/0000-0001-6091-3088'
    affiliation: CeMM Research Center for Molecular Medicine
identifiers:
  - type: doi
    value: 10.5281/zenodo.10689139
    description: >-
      This DOI represents all versions, and will always
      resolve to the latest one.
repository-code: 'https://github.com/epigen/dea_seurat/'
url: 'https://epigen.github.io/dea_seurat/'
abstract: >-
  A Snakemake workflow for performing differential
  expression analyses (DEA) on (multimodal) scRNA-seq data
  powered by the R package Seurat.
keywords:
  - scRNA-seq
  - Differential Expression Analysis
  - Visualization
  - Bioinformatics
  - Workflow
  - Snakemake
license: MIT

GitHub Events

Total
  • Create event: 2
  • Release event: 2
  • Issues event: 3
  • Watch event: 9
  • Issue comment event: 1
  • Push event: 5
  • Fork event: 1
Last Year
  • Create event: 2
  • Release event: 2
  • Issues event: 3
  • Watch event: 9
  • Issue comment event: 1
  • Push event: 5
  • Fork event: 1

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 39
  • Total Committers: 1
  • Avg Commits per committer: 39.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 8
  • Committers: 1
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
sreichl r****n@g****m 39

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 18
  • Total pull requests: 0
  • Average time to close issues: about 2 months
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 0.28
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: 27 days
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.33
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • sreichl (15)
  • roblehmann (2)
Pull Request Authors
Top Labels
Issue Labels
enhancement (9) bug (2)
Pull Request Labels