https://github.com/clavellab/mimic

Minimal Microbial Consortia creation tools

https://github.com/clavellab/mimic

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: wiley.com, nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Minimal Microbial Consortia creation tools

Basic Info
  • Host: GitHub
  • Owner: ClavelLab
  • License: lgpl-3.0
  • Language: R
  • Default Branch: main
  • Size: 21.5 MB
Statistics
  • Stars: 10
  • Watchers: 1
  • Forks: 4
  • Open Issues: 0
  • Releases: 0
Created about 5 years ago · Last pushed about 4 years ago
Metadata Files
Readme License

README.md

logo

MiMiC

Minimal Microbial Consortia creation tools

What is MiMiC?

MiMiC proposes minimal microbial consortia from the functional potential of a given metagenomic sample. This is done by producing a binary vector (presence,absence) of all Pfams (protein families) within the sample and comparing this again a pre-calculated genome database. Through an iterative process, the genomes that best match the metagenomic samples functional repertoire are selected. Once all functions have been accounted for, or the addition of further genome accounts for no more Pfams, the process is halted and the user provided with the list and statistics for selection.

We hope that this tool will lead to the further adoption and use of minimal microbial consortia for next generation probiotics as well as basic research into microbial communities.

Installation

Before running MiMiC, all the dependancies below must be installed: - PRODIGAL - HMMSCAN

Additional R modules used by MiMiC that must be installed are: - dplyr - plyr - inflection - tidyverse - ggplot2 - taRifx - readr

Once the dependancies have been installed, the user may test the installation by running the following commands:

TBA

Running MiMiC

MiMiC is currently designed as a series of four scripts which the user must apply to a dataset subsequentially (step 1-4).

Some of these files require the user to alter the code to include their files (all files will accept command line arguments in future editions).

Step1; step1contigto_Pfam.sh

Description: Identifies the proteins within your genomes/metagenome and annotate them against the Pfam database.

Usage: bash step_1_contig_to_Pfam.sh -i PathToInputFolder -d PathToPfamDatabase -t optionforProdigal(meta or single)

Options: -i path to input folder where contigs/assemblies are kept -d path to hmmpressed pfam database -t if metagenomic assembly then meta, otherwise single

Output folder: outputfaa : all faa files are generaed in this folder outputpfam : all hmmscan output files are generated in this folder output_fna : all fna (prodigal) files are generated in this folder

Step2; step2hmmscanparsed.pl

Description: Parses the Hmmscan output into a format MiMiC can utilise in Step 3.

Usage: perl step_2_hmmscan_parsed.pl path_to_input_folder

Notes: The input folder should contain only faa files inside.

Step3; step3pfamvector.R

Description: This script makes a binary vector of Pfam-based genomic/metagenomic profiles.

Usage: Rscript --vanilla script_3_pfam_vector.R -i pfam.A.clans -p Step2Output -o Step3Output

Options: -i Pfam.A.clans file from pfam database -p path for input folder where all pfam parsed files are available -o name of output file name (default is PfamVector.txt)

Step4; step4_mimic.R

Description: This script calculates the minimal microbial consortium for a metagenome based on the minimum number of species covering the maximum metagenomic functionality.

Usage: Rscript --vanilla step_4_mimic.txt -m metagenomefileName -g genomeVectorFileName -i iterationNumber -o mimicOutputName -k kneepointbasedOutputName

Options: -m input metagenome Pfam vector -g input reference genome Pfam vector -i iteration number to be run for each metagenome -o mimic outputfile name (optional: default MiMiC.txt) -k mimic output file name with knee point (optional : MiMiC_KneePoint.txt)

Example dataset

The 'example_data' folder contains the initial run of MiMiC on the PiBAC collection to generate the published minimal consortia (https://www.nature.com/articles/s41467-020-19929-w).

These files include: - PfamdatabasePfam-A.clans30818.tsv, this file is the pfam list used for the analysis. - PiBCGenomeBinaryVector111update3oct2019.txt, this file has the pfam binary vector of 111 bacterial species (PiBAC).
- PiBC
Vector284updated_3rdOct.txt, pfam vetor for the Metagenome used in PiBAC analysis

Reference data sets

The reference database file (MiMiC_reference_Database.tar.gz) contains four folders inside:

  • 1.Host specific, this folder has three host specific species pfam binary vector

    • human
    • pig
  • 2.Metagenome used in study

    • MetagenomeVector_937.txt : pfam vecor of metagenomes used in the MiMiC paper study
    • Allmeta937.txt : meta data information with sample name and the source of the metagenome
  • 3.Mock community

    • Pfam vector of the mBarc mock community metagenome
  • 4.NCBI RefSeq

    • Pfam vector of the ncbi refseq genomes used in the MiMiC study

Citation

We ask that anyone who uses MiMiC cites not only our publication but also the list of publications below, which provide tools and databases which are integral for MiMiCs working:

Tools

Databases

Owner

  • Name: The Clavel lab
  • Login: ClavelLab
  • Kind: organization
  • Location: Germany

This is the official GitHub account for the research group of Prof. Thomas Clavel.

GitHub Events

Total
  • Issues event: 1
  • Watch event: 2
  • Fork event: 1
Last Year
  • Issues event: 1
  • Watch event: 2
  • Fork event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: about 22 hours
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: about 22 hours
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ebi-jlu8 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels