metaflowx
MetaflowX: A Scalable and Resource-Efficient Workflow for Multi-Strategy Metagenomic Analysis
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
MetaflowX: A Scalable and Resource-Efficient Workflow for Multi-Strategy Metagenomic Analysis
Basic Info
Statistics
- Stars: 10
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
MetaflowX User Manual
MetaflowX is a scalable and modular metagenomics analysis pipeline powered by Nextflow. It supports both short-read and contig-based inputs and automates key analyses such as taxonomic profiling, functional annotation, gene catalog construction, and MAG recovery.
Contents
1. Pipeline Summary
The MetaflowX pipeline consists of the following steps:
- Quality control (
fastpTrimmomaticBowtie2) - Contig assembly (
SPAdesMEGAHIT) - Microbial taxonomy and metabolic function analysis (
MetaPhlAnHUMAnNKraken2) - Gene catalog construction (
ProdigalCD-HITeggNOG-mapperantiSMASHBiG-MAP) - MAG binning and evaluation (
MetaBAT2CONCOCTSemiBin2MaxBin2MetaBinnerCOMEBinbinnyMetaDecoderVambDAS_ToolMAGScoTCheckm2dRepGalahGTDB-TkCoverMDeepurifyCOBRA) - Report generation (
JinjaMultiQC)
For module-level details, see Module Description.
2. Getting Started
2.1 Quick Start
If Nextflow is already installed, you can quickly validate MetaflowX using demo tests:
1. Clone the repository:
bash
git clone https://github.com/01life/MetaflowX.git
2. Run either of the following tests:
1️⃣ Test 1: Full pipeline dry run using stub mode (no Docker or Conda required)
This test runs the full pipeline structure with small input and stubbed commands (logic is tested, but real computation is skipped).
It requires only Nextflow, no Docker or Conda.
bash
nextflow run MetaflowX -stub -profile test_stub --outdir stub_remote
2️⃣ Test 2: Run a single module (nf-core/fastp only, requires Docker)
This test runs the built-in nf-core/fastp module with a demo input. It requires Docker.
bash
nextflow run MetaflowX -profile test --outdir remote
💡 Both tests finish in a few minutes and produce logs and outputs under the specified --outdir.
[!NOTE] If you want to run the Nextflow pipeline in the background, you can add the
-bgoption:
bash nextflow run -bg MetaflowX -profile test --outdir remote > remote.out⚠️ These are functional tests only, not for biological analysis.
2.2 Prerequisites
We recommend preparing all software environments and databases in advance:
- See Environment Guide for setting up Conda and required tools.
- See Database Guide for downloading and configuring necessary reference data.
2.3 Installation
- Clone the repository:
bash git clone https://github.com/01life/MetaflowX.git
2.4 Environment & Database Check
[!NOTE] If you are new to Nextflow and nf-core, check the Nextflow installation guide. Ensure your setup passes the
-profile testbefore processing real data.
After installation, validate your full environment using built-in paired-end test data under test/data/:
Sample1_R1.fq.gz,Sample1_R2.fq.gzSample2_R1.fq.gz,Sample2_R2.fq.gz
First, prepare an input file samplesheet.csv (see Basic Usage), then run:
bash
nextflow run MetaflowX \
-profile <docker/singularity/conda/.../institute> \
--input samplesheet.csv \
--outdir full_test
This run will:
- Check tool availability
- Verify database paths
- Execute major pipeline steps
[!IMPORTANT] ⚠️ Before running, ensure that the following configuration files are properly set:
nextflow.config: general defaults, database pathsconf/modules.config: tool environments and optionsconf/base.config: compute resources (CPU, memory)✅ Match the
-profileflag to your local compute environment.💡 You can use
-profile slurm,-profile docker,-profile local, etc.
⏱️ This test may take several minutes depending on system specs.
3. How to run
3.1 Basic usage
- Prepare a samplesheet
samplesheet.csvwith your input data that looks as follows:
csv
id,raw_reads1,raw_reads2
S1,/path/to/Sample1_R1.fastq.gz,/path/to/Sample1_R2.fastq.gz
S2,/path/to/Sample2_R1.fastq.gz,/path/to/Sample2_R2.fastq.gz
- Now, you can run the pipeline using:
bash
nextflow run MetaflowX \
-profile <docker/singularity/conda/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
[!WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
For more details and further functionality, please refer to the usage documentation. You can use the following command to see all the parameters of the pipeline.
bash
nextflow run MetaflowX --help
[!NOTE] MetaflowX relies on plenty of tools and their databases. For detailed installation and configuration instructions, please refer to the dependencies guide, database guide and version documentation.
3.2 Advanced Usage
MetaflowX supports:
- Single-end and paired-end reads
- Selective module execution using
--modeand--skipparameters - Custom database paths and tool options
📖 For a full overview of available parameters and advanced configuration, see the Usage Guide.
📘 For practical examples of common execution modes and corresponding commands, refer to the Execution Guide.
4. Output
The results generated by MetaflowX include the following sections:
✤ Quality control - 01.CleanData/
✤ Contig assembly - 02.Contig
✤ Microbial taxonomy and metabolic function analysis
- 101.MetaPhlAn
- 102.HUMAnN
✤ Gene catalog construction
- 03.Geneset
- 04.GenesetProfile
✤ Automated binning analysis
- 05.BinSet
- 06.BinsetProfile
✤ Report generation - 07.MultiQC - MetaflowXReport*.html
✤ Pipeline information - pipeline_info
See Output Documentation for details.
5. Support
- Visit the MetaflowX tutorial for examples and explanations.
- Check the Changelog for version history.
- Report issues via the GitHub Issues page.
6. Credits
❤️ MetaflowX was developed with support from 01Life. ️
MetaflowX is developed by:
👩💻 Yang Ying
👩💻 Liang Lifeng
With contributions and feedback from:
👨 Xie Hailiang
👨💻 Long Shibin
7. Citations
If you use MetaflowX in your research, please cite:
MetaflowX: A Scalable and Resource-Efficient Workflow for Multi-Strategy Metagenomic Analysis
For all third-party tools used, refer to CITATIONS.md.
Owner
- Name: 01life
- Login: 01life
- Kind: organization
- Repositories: 1
- Profile: https://github.com/01life
Citation (CITATIONS.md)
# nf-core/metaflowx: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [fastp](https://doi.org/10.1093/bioinformatics/bty560) - [Trimmomatic](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590) - [SPAdes](https://doi.org/10.1002/cpbi.102) - [MEGAHIT](https://pubmed.ncbi.nlm.nih.gov/25609793/) - [MetaPhlAn](https://www.nature.com/articles/s41587-023-01688-w) - [HUMAnN](https://elifesciences.org/articles/65088) - [Kraken2](https://dx.doi.org/10.1186/s13059-019-1891-0) - [Prodigal](https://doi.org/10.1186/1471-2105-11-119) - [CD-HIT](https://www.bioinformatics.org/cd-hit/) - [eggNOG-mapper](https://doi.org/10.1093/molbev/msab293) - [samtools](https://doi.org/10.1093/gigascience/giab008) - [DIAMOND](https://doi.org/10.1038/s41592-021-01101-x) - [RGI](https://www.ncbi.nlm.nih.gov/pubmed/36263822) - [antiSMASH](https://doi.org/10.1093/nar/gkab335) - [BiG-MAP](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8547482/) - [MetaBAT2](https://doi.org/10.7717/peerj.7359) - [CONCOCT](https://doi.org/10.1038/nmeth.3103) - [MetaBinner](https://doi.org/10.1186/s13059-022-02832-6) - [MaxBin2](https://doi.org/10.1093/bioinformatics/btv638) - [MetaDecoder](https://doi.org/10.1186/s40168-022-01237-8) - [Vamb](https://doi.org/10.1038/s41587-020-00777-4) - [SemiBin2](https://doi.org/10.1093/bioinformatics/btad209) - [COMEBin](https://doi.org/10.1038/s41467-023-44290-z) - [binny](https://doi.org/10.1101/2021.12.22.473795) - [MAGScoT](https://doi.org/10.1101/2022.05.17.492251) - [DAS Tool](https://doi.org/10.1038/s41564-018-0171-1) - [CheckM2](https://doi.org/10.1038/s41592-023-01940-w) - [QUAST](https://doi.org/10.1093/bioinformatics/bty266) - [dRep](https://doi.org/10.1101/108142) - [galah](https://doi.org/10.1038/s41587-020-00777-4) - [GTDB-Tk](https://doi.org/10.1093/bioinformatics/btac672) - [CoverM](https://doi.org/10.1093/bioinformatics/btaf147) - [Deepurify](https://www.nature.com/articles/s42256-024-00908-5) - [BWA](https://doi.org/10.48550/arXiv.1303.3997) - [Mash](https://doi.org/10.1186/s13059-019-1841-x) - [Bowtie2](https://www.nature.com/articles/nmeth.1923) - [MultiQC](https://doi.org/10.1093/bioinformatics/btw354) ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Release event: 3
- Watch event: 12
- Public event: 2
- Push event: 40
- Pull request event: 25
- Create event: 5
Last Year
- Release event: 3
- Watch event: 12
- Public event: 2
- Push event: 40
- Pull request event: 25
- Create event: 5