https://github.com/biocore/gemelli

Gemelli is a tool box for running Robust Aitchison PCA (RPCA), Joint Robust Aitchison PCA (Joint-RPCA), TEMPoral TEnsor Decomposition (TEMPTED), and Compositional Tensor Factorization (CTF) on sparse compositional omics datasets.

https://github.com/biocore/gemelli

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org
  • Committers with academic emails
    3 of 8 committers (37.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary

Keywords

compositional-data-analysis ctf joint-rpca microbiome qiime2 rclr rpca tempted

Keywords from Contributors

bioinformatics qiime biplot ordination phylogeny tree-plot
Last synced: 5 months ago · JSON representation

Repository

Gemelli is a tool box for running Robust Aitchison PCA (RPCA), Joint Robust Aitchison PCA (Joint-RPCA), TEMPoral TEnsor Decomposition (TEMPTED), and Compositional Tensor Factorization (CTF) on sparse compositional omics datasets.

Basic Info
  • Host: GitHub
  • Owner: biocore
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 136 MB
Statistics
  • Stars: 85
  • Watchers: 6
  • Forks: 19
  • Open Issues: 38
  • Releases: 0
Topics
compositional-data-analysis ctf joint-rpca microbiome qiime2 rclr rpca tempted
Created about 7 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License

README.md

Gemelli

Gemelli is a tool box for running Robust Aitchison PCA (RPCA), Joint Robust Aitchison PCA (Joint-RPCA), TEMPoral TEnsor Decomposition (TEMPTED), and Compositional Tensor Factorization (CTF) on sparse compositional omics datasets.

RPCA can be used on cross-sectional datasets where each subject is sampled only once. CTF can be used on repeated-measure data where each subject is sampled multiple times (e.g. longitudinal sampling). TEMPTED is specifically designed for longitundal (time series) repeated measure studies, especially when samples are irregularly sampled across subjects. Joint-RPCA allows for the exploration of multiple omics datasets with shared samples at once. All these methods are unsupervised and aim to describe sample/subject variation and the biological features that separate them.

The preprocessing transform for both RPCA and CTF is the robust centered log-ratio transform (rlcr) which accounts for sparse data (i.e. many missing/zero values). Details on the rclr can be found here and a interactive introduction into the transformation can be found here. In short, the rclr log transforms the observed (nonzero) values before centering. RPCA and CTF then perform a matrix or tensor factorization on only the observed values after rclr transformation, similar to Aitchison PCA performed on dense data. If the data also has an associated phylogeny it can be incorporated through the phylogenetic rclr, details can be found here.

Installation

To install the most up to date version of gemelli, run the following command

# pip (only supported for QIIME2 >= 2018.8)
pip install gemelli

Note: that gemelli is not compatible with python 2, and is compatible with Python 3.4 or later.

Documentation

Gemelli can be run standalone or through QIIME2 and as a python API or CLI.

Cross-sectional / multi-omics study (i.e. one sample per subject) with RPCA

If you have a cross-sectional study design with only one sample per subject then RPCA is the appropriate method to use in gemelli. For examples of using RPCA we provide tutorials below exploring the microbiome between body sites.

Joint-RPCA allows for the exploration of those feature that seperate jointly across sample groupings and the potential interactions of those features.

Tutorials

Tutorials with QIIME2

Standalone tutorial outside of QIIME2

Repeated measures study (i.e. multiple sample per subject) with CTF & TEMPTED

Tutorials

If you have a repeated measures study design with multiple samples per subject over time or space then CTF is the appropriate method to use in gemelli. For optimal results CTF requires samples for each subject in each time or space measurement. If that is not the case and your study has irregular time sampling, then TEMPTED should be used. TEMPTED also allows for the projection of new data into an existing factorization which is necessary for machine learning. For examples, explore the tutorials below.

Tutorials with QIIME2

Standalone tutorial outside of QIIME2

Performing parameter optimization and QC on results

For an introduction to these QC methods see the tutorial here. Examples are also provided in the RPCA tutorials here (RPCA QIIME2 CLI) & here (RPCA Python API & CLI). Users are encrouaged to report the QC/CV results for thier data.

Citations

If you found this tool useful please cite the method(s) you used:

Citation for CTF

Martino, C. and Shenhav, L. et al. Context-aware dimensionality reduction deconvolutes gut microbial community dynamics. Nat. Biotechnol. (2020) doi:10.1038/s41587-020-0660-7

@article {Martino2020, author = {Martino, Cameron and Shenhav, Liat and Marotz, Clarisse A and Armstrong, George and McDonald, Daniel and V{\'a}zquez-Baeza, Yoshiki and Morton, James T and Jiang, Lingjing and Dominguez-Bello, Maria Gloria and Swafford, Austin D and Halperin, Eran and Knight, Rob}, title = {Context-aware dimensionality reduction deconvolutes gut microbial community dynamics}, year = {2020}, journal = {Nature biotechnology}, }

Citation for RPCA

Martino, C. et al. A Novel Sparse Compositional Technique Reveals Microbial Perturbations. mSystems 4, (2019)

@article {Martino2019, author = {Martino, Cameron and Morton, James T. and Marotz, Clarisse A. and Thompson, Luke R. and Tripathi, Anupriya and Knight, Rob and Zengler, Karsten}, editor = {Neufeld, Josh D.}, title = {A Novel Sparse Compositional Technique Reveals Microbial Perturbations}, volume = {4}, number = {1}, elocation-id = {e00016-19}, year = {2019}, doi = {10.1128/mSystems.00016-19}, publisher = {American Society for Microbiology Journals}, URL = {https://msystems.asm.org/content/4/1/e00016-19}, eprint = {https://msystems.asm.org/content/4/1/e00016-19.full.pdf}, journal = {mSystems} }

Citation for Phylogenetic RPCA

Martino, C. et al. A Novel Sparse Compositional Technique Reveals Microbial Perturbations. mSystems 4, (2019)

@ARTICLE{Martino2022, author = {Martino, Cameron and McDonald, Daniel and Cantrell, Kalen and Dilmore, Amanda Hazel and Vázquez-Baeza, Yoshiki and Shenhav, Liat and Shaffer, Justin P and Rahman, Gibraan and Armstrong, George and Allaband, Celeste and Song, Se Jin and Knight, Rob}, title = {Compositionally Aware Phylogenetic {Beta-Diversity} Measures Better Resolve Microbiomes Associated with Phenotype}, volume = {7}, number = {3}, elocation-id = {e0005022}, year = {2022}, doi = {10.1128/msystems.00050-22}, publisher = {American Society for Microbiology Journals}, URL = {http://dx.doi.org/10.1128/msystems.00050-22}, journal = {mSystems}, }

Citation for TEMPTED

Shi, p. et al. Time-Informed Dimensionality Reduction for Longitudinal Microbiome Studies. bioRxiv, (2023)

@ARTICLE{Shi2023, author = {Shi, Pixu and Martino, Cameron and Han, Rungang and Janssen, Stefan and Buck, Gregory and Serrano, Myrna and Owzar, Kouros and Knight, Rob and Shenhav, Liat and Zhang, Anru R}, title = {{Time-Informed} Dimensionality Reduction for Longitudinal Microbiome Studies}, year = {2023}, doi = {10.1101/2023.07.26.550749}, URL = {https://www.biorxiv.org/content/10.1101/2023.07.26.550749v1}, journal = {bioRxiv}, }

Other Resources

Owner

  • Name: biocore
  • Login: biocore
  • Kind: organization
  • Location: Cyberspace

Collaboratively developed bioinformatics software.

GitHub Events

Total
  • Issues event: 11
  • Watch event: 9
  • Issue comment event: 26
  • Fork event: 2
Last Year
  • Issues event: 11
  • Watch event: 9
  • Issue comment event: 26
  • Fork event: 2

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 319
  • Total Committers: 8
  • Avg Commits per committer: 39.875
  • Development Distribution Score (DDS): 0.119
Past Year
  • Commits: 25
  • Committers: 2
  • Avg Commits per committer: 12.5
  • Development Distribution Score (DDS): 0.04
Top Committers
Name Email Commits
cameronmartino c****o@g****m 281
gwarmstrong g****g@c****u 18
kcantrel k****l@u****u 11
ahdilmore a****e@g****m 4
Gibraan Rahman g****n@e****u 2
Lisa l****5@g****m 1
Colin J. Brislawn c****l@g****m 1
Cameron Martino c****o@c****l 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 78
  • Total pull requests: 33
  • Average time to close issues: 3 months
  • Average time to close pull requests: 17 days
  • Total issue authors: 32
  • Total pull request authors: 9
  • Average comments per issue: 1.29
  • Average comments per pull request: 1.48
  • Merged pull requests: 30
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 18
  • Pull requests: 1
  • Average time to close issues: 5 days
  • Average time to close pull requests: 5 days
  • Issue authors: 9
  • Pull request authors: 1
  • Average comments per issue: 2.11
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • cameronmartino (25)
  • callaband (6)
  • mestaki (6)
  • magibc (4)
  • gwarmstrong (4)
  • SilasK (3)
  • gibsramen (3)
  • ARW-UBT (2)
  • colinbrislawn (2)
  • ETaSky (2)
  • adswafford (1)
  • samd1993 (1)
  • adamsorbie (1)
  • ivanllampy (1)
  • sjanssen2 (1)
Pull Request Authors
  • cameronmartino (25)
  • kwcantrell (2)
  • gibsramen (2)
  • gwarmstrong (2)
  • colinbrislawn (2)
  • lisa55asil (1)
  • mortonjt (1)
  • ahdilmore (1)
  • ElDeveloper (1)
Top Labels
Issue Labels
bug (8) enhancement (8)
Pull Request Labels
bug (2) enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 208 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 12
  • Total maintainers: 1
pypi.org: gemelli

Robust Aitchison Tensor Decomposition for sparse count data

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 208 Last month
Rankings
Dependent packages count: 10.1%
Average: 17.6%
Downloads: 21.0%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

ci/conda_requirements.txt pypi
  • IPython >4.0.0
  • biom-format *
  • flake8 *
  • matplotlib *
  • nose *
  • notebook *
  • numpy >=1.12.1
  • pandas *
  • pep8 *
  • pip *
  • scikit-bio *
  • scikit-learn *
  • scipy *
  • seaborn *
ci/pip_requirements.txt pypi
  • coveralls *
setup.py pypi
  • biom-format *
  • click *
  • h5py *
  • iow *
  • nose *
  • numpy *
  • pandas *
  • scikit-bio *
  • scikit-learn *
  • scipy *
  • tax2tree *
.github/workflows/main.yml actions
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v2 composite