metagem

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data

https://github.com/franciscozorrilla/metagem

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org, zenodo.org
  • Committers with academic emails
    2 of 8 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary

Keywords

bioinformatics computational-biology flux-balance-analysis genome-scale-metabolic-model gut-microbiome mags metabolic-modeling metabolic-models metabolism metagenome-assembled-genomes metagenomics microbial-ecology microbiome snakemake systems-biology
Last synced: 6 months ago · JSON representation ·

Repository

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data

Basic Info
Statistics
  • Stars: 241
  • Watchers: 5
  • Forks: 46
  • Open Issues: 13
  • Releases: 6
Topics
bioinformatics computational-biology flux-balance-analysis genome-scale-metabolic-model gut-microbiome mags metabolic-modeling metabolic-models metabolism metagenome-assembled-genomes metagenomics microbial-ecology microbiome snakemake systems-biology
Created over 7 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Code of conduct Citation

README.md

💎 metaGEM

Note An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data.

Nucleic Acids Research bioRxiv Build Status GitHub license Snakemake Anaconda-Server Badge Gitter chat DOI Open In Colab Anaconda-Server Badge Anaconda-Server Badge

metawrapfigs_final4 001

metaGEM is a Snakemake workflow that integrates an array of existing bioinformatics and metabolic modeling tools, for the purpose of predicting metabolic interactions within bacterial communities of microbiomes. From whole metagenome shotgun datasets, metagenome assembled genomes (MAGs) are reconstructed, which are then converted into genome-scale metabolic models (GEMs) for in silico simulations. Additional outputs include abundance estimates, taxonomic assignment, growth rate estimation, pangenome analysis, and eukaryotic MAG identification.

⚙️ Installation

You can start using metaGEM on your cluster with just one line of code with the mamba package manager

mamba create -n metagem -c bioconda metagem

This will create an environment called metagem and start installing dependencies. Please consult the config/README.md page for more detailed setup instructions.

installation

🔧 Usage

Clone this repo

git clone https://github.com/franciscozorrilla/metaGEM.git && cd metaGEM/workflow

Run metaGEM without any arguments to see usage instructions:

bash metaGEM.sh ``` Usage: bash metaGEM.sh [-t|--task TASK] [-j|--nJobs NUMBER OF JOBS] [-c|--cores NUMBER OF CORES] [-m|--mem GB RAM] [-h|--hours MAX RUNTIME] [-l|--local]

Options: -t, --task Specify task to complete:

                    SETUP
                        createFolders
                        downloadToy
                        organizeData
                        check

                    CORE WORKFLOW
                        fastp 
                        megahit 
                        crossMapSeries
                        kallistoIndex
                        crossMapParallel
                        kallisto2concoct 
                        concoct 
                        metabat
                        maxbin 
                        binRefine 
                        binReassemble 
                        extractProteinBins
                        carveme
                        memote
                        organizeGEMs
                        smetana
                        extractDnaBins
                        gtdbtk
                        abundance

                    BONUS
                        grid
                        prokka
                        roary
                        eukrep
                        eukcc

                    VISUALIZATION (in development)
                        stats
                        qfilterVis
                        assemblyVis
                        binningVis
                        taxonomyVis
                        modelVis
                        interactionVis
                        growthVis

-j, --nJobs Specify number of jobs to run in parallel -c, --nCores Specify number of cores per job -m, --mem Specify memory in GB required for job -h, --hours Specify number of hours to allocated to job runtime -l, --local Run jobs on local machine for non-cluster usage ```

🧉 Try it now

You can set up and use metaGEM on the cloud by following along the google colab notebook.

Open In Colab

Please note that google colab does not provide the computational resources necessary to fully run metaGEM on a real dataset. This notebook demonstrates how to set up and use metaGEM by perfoming the first steps in the workflow on a toy dataset.

💩 Tutorials

metaGEM can be used to explore your own gut microbiome sequencing data from at-home-test-kit services such as unseen bio. The following tutorial showcases the metaGEM workflow on two unseenbio samples.

Tutorial

For an introductory metabolic modeling tutorial, refer to the resources compiled for the EMBOMicroCom: Metabolite and species dynamics in microbial communities workshop in 2022.

Tutorial3

For a more advanced tutorial, check out the resources we put together for the SymbNET: from metagenomics to metabolic interactions course in 2022.

Tutorial2

🏛️ Wiki

Refer to the wiki for additional usage tips, frequently asked questions, and implementation details.

wiki

📦 Datasets

  • You can access the metaGEM-generated results for the publication here. 🧪 Small communities of gut microbes from lab cultures 💩 Real gut microbiome samples from Swedish diabetes paper 🪴 Plant-associated soil samples from Chinese rhizobiome study 🌏 Bulk-soil samples from Australian biodiversity analysis 🌊 Ocean water samples from global TARA Oceans expeditions
  • Additionally, you can access metaGEM-generated results from a reanalysis of recently published ancient metagenomes here.

🐍 Workflow

Core

  1. Quality filter reads with fastp
  2. Assembly with megahit
  3. Draft bin sets with CONCOCT, MaxBin2, and MetaBAT2
  4. Refine & reassemble bins with metaWRAP
  5. Taxonomic assignment with GTDB-tk
  6. Relative abundances with bwa and samtools
  7. Reconstruct & evaluate genome-scale metabolic models with CarveMe and memote
  8. Species metabolic coupling analysis with SMETANA

Bonus

  1. Growth rate estimation with GRiD, SMEG or CoPTR
  2. Pangenome analysis with roary
  3. Eukaryotic draft bins with EukRep and EukCC

🏗️ Active Development

If you want to see any new additional or alternative tools incorporated into the metaGEM workflow please raise an issue or create a pull request. Snakemake allows workflows to be very flexible, so adding new rules is as easy as filling out the following template and adding it to the Snakefile:

``` rule package-name: input: rules.rulename.output output: f'{config["path"]["root"]}/{config["folder"]["X"]}/{{IDs}}/output.file' message: """ Helpful and descriptive message detailing goal of this rule/package. """ shell: """ # Well documented command line instructions go here

    # Load conda environment 
    set +u;source activate {config[envs][package]};set -u;

    # Run tool
    package-name -i {input} -o {output}
    """

```

🖇️ Publications

The metaGEM workflow has been used in multiple studies, including the following non-exhaustive list:

Plastic-degrading potential across the global microbiome correlates with recent pollution trends J Zrimec, M Kokina, S Jonasson, F Zorrilla, A Zelezniak MBio, 2021

Competition-cooperation in the chemoautotrophic ecosystem of Movile Cave: first metagenomic approach on sediments Chiciudean, I., Russo, G., Bogdan, D.F. et al. Environmental Microbiome, 2022

The National Ecological Observatory Network’s soil metagenomes: assembly and basic analysis Werbin ZR, Hackos B, Lopez-Nava J et al. F1000Research, 2022

Microbial interactions shape cheese flavour formation Melkonian, C., Zorrilla, F., Kjærbølling, I. et al. Nature Communications, 2023

🍾 Please cite

metaGEM: reconstruction of genome scale metabolic models directly from metagenomes Francisco Zorrilla, Filip Buric, Kiran R Patil, Aleksej Zelezniak Nucleic Acids Research, 2021; gkab815, https://doi.org/10.1093/nar/gkab815

Nucleic Acids Research

⭐ Star History

Star History Chart

📲 Contact

Please reach out with any comments, concerns, or discussions regarding metaGEM.

Gitter chat Twitter LinkedIn email

Owner

  • Name: Francisco Zorrilla
  • Login: franciscozorrilla
  • Kind: user
  • Location: Cambridge, UK
  • Company: MRC Toxicology, University of Cambridge

PhD Student in the Patil Lab. Interested in metagenomics, metabolic modeling of microbial communities, and machine learning

Citation (CITATION.bib)

@article{10.1093/nar/gkab815,
    author = {Zorrilla, Francisco and Buric, Filip and Patil, Kiran R and Zelezniak, Aleksej},
    title = "{metaGEM: reconstruction of genome scale metabolic models directly from metagenomes}",
    journal = {Nucleic Acids Research},
    volume = {49},
    number = {21},
    pages = {e126-e126},
    year = {2021},
    month = {10},
    abstract = "{Metagenomic analyses of microbial communities have revealed a large degree of interspecies and intraspecies genetic diversity through the reconstruction of metagenome assembled genomes (MAGs). Yet, metabolic modeling efforts mainly rely on reference genomes as the starting point for reconstruction and simulation of genome scale metabolic models (GEMs), neglecting the immense intra- and inter-species diversity present in microbial communities. Here, we present metaGEM (https://github.com/franciscozorrilla/metaGEM), an end-to-end pipeline enabling metabolic modeling of multi-species communities directly from metagenomes. The pipeline automates all steps from the extraction of context-specific prokaryotic GEMs from MAGs to community level flux balance analysis (FBA) simulations. To demonstrate the capabilities of metaGEM, we analyzed 483 samples spanning lab culture, human gut, plant-associated, soil, and ocean metagenomes, reconstructing over 14,000 GEMs. We show that GEMs reconstructed from metagenomes have fully represented metabolism comparable to isolated genomes. We demonstrate that metagenomic GEMs capture intraspecies metabolic diversity and identify potential differences in the progression of type 2 diabetes at the level of gut bacterial metabolic exchanges. Overall, metaGEM enables FBA-ready metabolic model reconstruction directly from metagenomes, provides a resource of metabolic models, and showcases community-level modeling of microbiomes associated with disease conditions allowing generation of mechanistic hypotheses.}",
    issn = {0305-1048},
    doi = {10.1093/nar/gkab815},
    url = {https://doi.org/10.1093/nar/gkab815},
    eprint = {https://academic.oup.com/nar/article-pdf/49/21/e126/41503923/gkab815.pdf},
}

GitHub Events

Total
  • Issues event: 10
  • Watch event: 51
  • Issue comment event: 22
  • Fork event: 7
Last Year
  • Issues event: 10
  • Watch event: 51
  • Issue comment event: 22
  • Fork event: 7

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 624
  • Total Committers: 8
  • Avg Commits per committer: 78.0
  • Development Distribution Score (DDS): 0.361
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Francisco Zorrilla z****a@c****e 399
Francisco Zorrilla f****4@c****k 205
Filip Buric 2****c 10
Zoey Werbin z****n@g****m 3
Bastian Seelbinder s****r@l****e 3
Bartosz Bartmanski b****i@g****m 2
Xentrics b****r@w****e 1
Stephen 3****x 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 82
  • Total pull requests: 18
  • Average time to close issues: 4 months
  • Average time to close pull requests: 25 days
  • Total issue authors: 40
  • Total pull request authors: 5
  • Average comments per issue: 3.33
  • Average comments per pull request: 0.78
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 11
  • Pull requests: 2
  • Average time to close issues: 3 months
  • Average time to close pull requests: less than a minute
  • Issue authors: 9
  • Pull request authors: 1
  • Average comments per issue: 3.36
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • franciscozorrilla (18)
  • microbial-cookie (6)
  • kunaljaani (6)
  • zoey-rw (3)
  • White-Shinobi (3)
  • LiZhihua1982 (3)
  • slambrechts (3)
  • jckim012 (2)
  • Poccia (2)
  • Xentrics (2)
  • yhbae6022 (2)
  • ecairns62 (2)
  • paristzou (2)
  • shreyanshumale (2)
  • flefler (1)
Pull Request Authors
  • franciscozorrilla (12)
  • Xentrics (4)
  • BartoszBartmanski (1)
  • zoey-rw (1)
  • codechenx (1)
Top Labels
Issue Labels
question (50) enhancement (13) bug (9) documentation (5) method (3) maintenance (2) duplicate (1)
Pull Request Labels
enhancement (5) bug (4) maintenance (3) method (3)