metaflowx

MetaflowX: A Scalable and Resource-Efficient Workflow for Multi-Strategy Metagenomic Analysis

https://github.com/01life/metaflowx

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

MetaflowX: A Scalable and Resource-Efficient Workflow for Multi-Strategy Metagenomic Analysis

Basic Info
  • Host: GitHub
  • Owner: 01life
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 19.6 MB
Statistics
  • Stars: 10
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 1 year ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

nf-core MetaflowX Logo

Nextflow run with conda run with docker run with singularity run with slurm Cite with Zenodo

MetaflowX User Manual

MetaflowX is a scalable and modular metagenomics analysis pipeline powered by Nextflow. It supports both short-read and contig-based inputs and automates key analyses such as taxonomic profiling, functional annotation, gene catalog construction, and MAG recovery.

nf-core-metassembly workflow overview

Contents

1. Pipeline Summary

The MetaflowX pipeline consists of the following steps:

  1. Quality control ( fastp Trimmomatic Bowtie2)
  2. Contig assembly ( SPAdes MEGAHIT )
  3. Microbial taxonomy and metabolic function analysis ( MetaPhlAn HUMAnN Kraken2 )
  4. Gene catalog construction ( Prodigal CD-HIT eggNOG-mapper antiSMASH BiG-MAP )
  5. MAG binning and evaluation ( MetaBAT2 CONCOCT SemiBin2 MaxBin2 MetaBinner COMEBin binny MetaDecoder Vamb DAS_Tool MAGScoT Checkm2 dRep Galah GTDB-Tk CoverM Deepurify COBRA )
  6. Report generation ( Jinja MultiQC )

For module-level details, see Module Description.

2. Getting Started

2.1 Quick Start

If Nextflow is already installed, you can quickly validate MetaflowX using demo tests:

1. Clone the repository:

bash git clone https://github.com/01life/MetaflowX.git

2. Run either of the following tests:

1️⃣ Test 1: Full pipeline dry run using stub mode (no Docker or Conda required)

This test runs the full pipeline structure with small input and stubbed commands (logic is tested, but real computation is skipped).

It requires only Nextflow, no Docker or Conda.

bash nextflow run MetaflowX -stub -profile test_stub --outdir stub_remote

2️⃣ Test 2: Run a single module (nf-core/fastp only, requires Docker)

This test runs the built-in nf-core/fastp module with a demo input. It requires Docker.

bash nextflow run MetaflowX -profile test --outdir remote

💡 Both tests finish in a few minutes and produce logs and outputs under the specified --outdir.

[!NOTE] If you want to run the Nextflow pipeline in the background, you can add the -bg option:

bash nextflow run -bg MetaflowX -profile test --outdir remote > remote.out

⚠️ These are functional tests only, not for biological analysis.

2.2 Prerequisites

We recommend preparing all software environments and databases in advance:

2.3 Installation

  1. Clone the repository: bash git clone https://github.com/01life/MetaflowX.git

2.4 Environment & Database Check

[!NOTE] If you are new to Nextflow and nf-core, check the Nextflow installation guide. Ensure your setup passes the -profile test before processing real data.

After installation, validate your full environment using built-in paired-end test data under test/data/:

  • Sample1_R1.fq.gz, Sample1_R2.fq.gz
  • Sample2_R1.fq.gz, Sample2_R2.fq.gz

First, prepare an input file samplesheet.csv (see Basic Usage), then run:

bash nextflow run MetaflowX \ -profile <docker/singularity/conda/.../institute> \ --input samplesheet.csv \ --outdir full_test

This run will:

  • Check tool availability
  • Verify database paths
  • Execute major pipeline steps

[!IMPORTANT] ⚠️ Before running, ensure that the following configuration files are properly set:

  • nextflow.config: general defaults, database paths
  • conf/modules.config: tool environments and options
  • conf/base.config: compute resources (CPU, memory)

✅ Match the -profile flag to your local compute environment.

💡 You can use -profile slurm, -profile docker, -profile local, etc.

⏱️ This test may take several minutes depending on system specs.

3. How to run

3.1 Basic usage

  1. Prepare a samplesheet samplesheet.csv with your input data that looks as follows:

csv id,raw_reads1,raw_reads2 S1,/path/to/Sample1_R1.fastq.gz,/path/to/Sample1_R2.fastq.gz S2,/path/to/Sample2_R1.fastq.gz,/path/to/Sample2_R2.fastq.gz

  1. Now, you can run the pipeline using:

bash nextflow run MetaflowX \ -profile <docker/singularity/conda/.../institute> \ --input samplesheet.csv \ --outdir <OUTDIR>

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation. You can use the following command to see all the parameters of the pipeline.

bash nextflow run MetaflowX --help

[!NOTE] MetaflowX relies on plenty of tools and their databases. For detailed installation and configuration instructions, please refer to the dependencies guide, database guide and version documentation.

3.2 Advanced Usage

MetaflowX supports:

  • Single-end and paired-end reads
  • Selective module execution using --mode and --skip parameters
  • Custom database paths and tool options

📖 For a full overview of available parameters and advanced configuration, see the Usage Guide.

📘 For practical examples of common execution modes and corresponding commands, refer to the Execution Guide.

4. Output

The results generated by MetaflowX include the following sections:

Quality control - 01.CleanData/

Contig assembly - 02.Contig

Microbial taxonomy and metabolic function analysis - 101.MetaPhlAn
- 102.HUMAnN

Gene catalog construction - 03.Geneset
- 04.GenesetProfile

Automated binning analysis - 05.BinSet
- 06.BinsetProfile

Report generation - 07.MultiQC - MetaflowXReport*.html

Pipeline information - pipeline_info

See Output Documentation for details.

5. Support

6. Credits

❤️ MetaflowX was developed with support from 01Life. ️

MetaflowX is developed by:

👩‍💻 Yang Ying

👩‍💻 Liang Lifeng

With contributions and feedback from:

👨 Xie Hailiang

👨‍💻 Long Shibin

7. Citations

If you use MetaflowX in your research, please cite:

MetaflowX: A Scalable and Resource-Efficient Workflow for Multi-Strategy Metagenomic Analysis

For all third-party tools used, refer to CITATIONS.md.

Owner

  • Name: 01life
  • Login: 01life
  • Kind: organization

Citation (CITATIONS.md)

# nf-core/metaflowx: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [fastp](https://doi.org/10.1093/bioinformatics/bty560)
- [Trimmomatic](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590)
- [SPAdes](https://doi.org/10.1002/cpbi.102)
- [MEGAHIT](https://pubmed.ncbi.nlm.nih.gov/25609793/)
- [MetaPhlAn](https://www.nature.com/articles/s41587-023-01688-w)
- [HUMAnN](https://elifesciences.org/articles/65088)
- [Kraken2](https://dx.doi.org/10.1186/s13059-019-1891-0)
- [Prodigal](https://doi.org/10.1186/1471-2105-11-119)
- [CD-HIT](https://www.bioinformatics.org/cd-hit/)
- [eggNOG-mapper](https://doi.org/10.1093/molbev/msab293)
- [samtools](https://doi.org/10.1093/gigascience/giab008)
- [DIAMOND](https://doi.org/10.1038/s41592-021-01101-x)
- [RGI](https://www.ncbi.nlm.nih.gov/pubmed/36263822)
- [antiSMASH](https://doi.org/10.1093/nar/gkab335)
- [BiG-MAP](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8547482/)
- [MetaBAT2](https://doi.org/10.7717/peerj.7359)
- [CONCOCT](https://doi.org/10.1038/nmeth.3103)
- [MetaBinner](https://doi.org/10.1186/s13059-022-02832-6)
- [MaxBin2](https://doi.org/10.1093/bioinformatics/btv638)
- [MetaDecoder](https://doi.org/10.1186/s40168-022-01237-8)
- [Vamb](https://doi.org/10.1038/s41587-020-00777-4)
- [SemiBin2](https://doi.org/10.1093/bioinformatics/btad209)
- [COMEBin](https://doi.org/10.1038/s41467-023-44290-z)
- [binny](https://doi.org/10.1101/2021.12.22.473795)
- [MAGScoT](https://doi.org/10.1101/2022.05.17.492251)
- [DAS Tool](https://doi.org/10.1038/s41564-018-0171-1)
- [CheckM2](https://doi.org/10.1038/s41592-023-01940-w)
- [QUAST](https://doi.org/10.1093/bioinformatics/bty266)
- [dRep](https://doi.org/10.1101/108142)
- [galah](https://doi.org/10.1038/s41587-020-00777-4)
- [GTDB-Tk](https://doi.org/10.1093/bioinformatics/btac672)
- [CoverM](https://doi.org/10.1093/bioinformatics/btaf147)
- [Deepurify](https://www.nature.com/articles/s42256-024-00908-5)
- [BWA](https://doi.org/10.48550/arXiv.1303.3997)
- [Mash](https://doi.org/10.1186/s13059-019-1841-x)
- [Bowtie2](https://www.nature.com/articles/nmeth.1923)
- [MultiQC](https://doi.org/10.1093/bioinformatics/btw354)


## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Release event: 3
  • Watch event: 12
  • Public event: 2
  • Push event: 40
  • Pull request event: 25
  • Create event: 5
Last Year
  • Release event: 3
  • Watch event: 12
  • Public event: 2
  • Push event: 40
  • Pull request event: 25
  • Create event: 5

Dependencies

pyproject.toml pypi