gemmaker

A workflow for construction of Gene Expression count Matrices (GEMs). Useful for Differential Gene Expression (DGE) analysis and Gene Co-Expression Network (GCN) construction

https://github.com/systemsgenetics/gemmaker

Science Score: 41.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

A workflow for construction of Gene Expression count Matrices (GEMs). Useful for Differential Gene Expression (DGE) analysis and Gene Co-Expression Network (GCN) construction

Basic Info
Statistics
  • Stars: 34
  • Watchers: 5
  • Forks: 16
  • Open Issues: 12
  • Releases: 6
Created over 8 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

GEMmaker

GEMmaker is a Nextflow workflow for large-scale gene expression sample processing, expression-level quantification and Gene Expression Matrix (GEM) construction. Results from GEMmaker are useful for differential gene expression (DGE) and gene co-expression network (GCN) analyses. The GEMmaker workflow currently supports Illumina RNA-seq datasets..

DOI

GitHub Actions CI Status GitHub Actions Linting Status

Nextflow run with conda run with docker run with singularity

How to Use

Please see the GEMmaker documentation for in-depth instructions for running GEMmaker.

Introduction

GEMmaker (i.e. systemsgenetics/gemmaker) is a pipeline for quantification of Illumina RNA-seq data. Users can choose from Hisat2, STAR, Kallisto or Salmon. It can process locally stored FASTQ files or automatically retrieve them from NCBI's SRA. The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

nf-core Compatibility

GEMmaker is an nf-core compatible workflow, however, GEMmaker is not an official nf-core workflow. This is because nf-core offers the nf-core/rnaseq workflow which is an excellent workflow for RNA-seq analysis that provides similar functionality to GEMmaker. However, GEMmaker is different in that it can scale to thousands of samples without exceeding local storage resources by running samples in batches and removing intermediate files. It can do the same for smaller sample sets on machines with less computational resources. This ability to scale is a unique adaption that is currently not provided by Nextflow. When Nextflow does provide support for batching and scaling, the nf-core/rnaseq will be updated and GEMmaker will probably be retired in favor of the nf-core workflow. Until then, if you are limited by storage GEMmaker can help! v

Credits

Please see the list of developers who have contributed to this repository.

Development of GEMmaker was funded by the U.S. National Science Foundation Award #1659300.

If you use GEMmaker in your research, please use this citation:

Hadish, J. A., Biggs, T. D., Shealy, B. T., Bender, M. R., McKnight, C. B., Wytko, C., Smith, M. C., Feltus, F. A., Honaas, L., & Ficklin, S. P. (2022). GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure. BMC Bioinformatics, 23(1), 1–11.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Quick Start

Please follow the instructions in the 'Online Documentation'

Owner

  • Name: SystemsGenetics
  • Login: SystemsGenetics
  • Kind: organization

Citation (CITATIONS.md)

# GEMmaker Citations

If you use GEMmaker for your analysis, please cite it using the following:

> John Hadish, Tyler Biggs, Ben Shealy, Connor Wytko, Sai Prudhvi Oruganti, F. Alex Feltus, & Stephen Ficklin. (2020, January 22). SystemsGenetics/GEMmaker: Release v1.1 (Version v1.1). Zenodo. [10.5281/zenodo.3620945](http://doi.org/10.5281/zenodo.3620945)

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

| Tool  | Citation or URL |
| ----- | ------------ |
| Nextflow |  Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. [10.1038/nbt.3820](https://doi.org/10.1038/nbt.3820) |
| SRAtoolkit | [https://github.com/ncbi/sra-tools](https://github.com/ncbi/sra-tools) |
| Aspera | [https://www.ibm.com/products/aspera](https://www.ibm.com/products/aspera) |
| FastQC | [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) |
| Trimmomatic | Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. [10.1093/bioinformatics/btu170](https://doi.org/10.1093/bioinformatics/btu170) |
| Hisat2 | Kim, D., Paggi, J. M., Park, C., Bennett, C., & Salzberg, S. L. (2019). Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology, 37(8), 907–915. [10.1038/s41587-019-0201-4](https://doi.org/10.1038/s41587-019-0201-4) |
| Kallisto | Bray, N. L., Pimentel, H., Melsted, P., & Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology. [10.1038/nbt.3519](https://doi.org/10.1038/nbt.3519) |
| Salmon | Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14(4), 417–419. [10.1038/nmeth.4197](https://doi.org/10.1038/nmeth.4197) |
| SAMtools | Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., & Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. [10.1093/bioinformatics/btp352](https://doi.org/10.1093/bioinformatics/btp352) |
| StringTie | Pertea, M., Kim, D., Pertea, G. M., Leek, J. T., & Salzberg, S. L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nature Protocols, 11(9), 1650–1667. [10.1038/nprot.2016.095](https://doi.org/10.1038/nprot.2016.095) |
| MultiQC | Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics. [10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354) |

## Software packaging/containerisation tools

* [Anaconda](https://anaconda.com)
    > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

* [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)
    > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

* [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)
    > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

* [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
    > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Watch event: 2
  • Create event: 1
Last Year
  • Watch event: 2
  • Create event: 1

Dependencies

.github/workflows/awsfulltest.yml actions
  • nf-core/tower-action v2 composite
.github/workflows/awstest.yml actions
  • nf-core/tower-action v2 composite
.github/workflows/branch.yml actions
  • mshick/add-pr-comment v1 composite
.github/workflows/ci.yml actions
  • actions/checkout v2 composite
.github/workflows/linting.yml actions
  • actions/checkout v2 composite
  • actions/checkout v1 composite
  • actions/setup-node v1 composite
  • actions/setup-python v1 composite
  • actions/upload-artifact v2 composite
  • mshick/add-pr-comment v1 composite
.github/workflows/linting_comment.yml actions
  • dawidd6/action-download-artifact v2 composite
  • marocchino/sticky-pull-request-comment v2 composite
Dockerfile docker
  • nfcore/base 1.13.3 build
docs/requirements.txt pypi
environment.yml pypi