metaflowx

MetaflowX: A Scalable and Resource-Efficient Workflow for Multi-Strategy Metagenomic Analysis

https://github.com/01life/metaflowx

Last synced: 10 months ago · JSON representation ·

Repository

MetaflowX: A Scalable and Resource-Efficient Workflow for Multi-Strategy Metagenomic Analysis

Basic Info

Host: GitHub
Owner: 01life
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 19.6 MB

Statistics

Stars: 10
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 2

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme Changelog License Code of conduct Citation

MetaflowX User Manual

MetaflowX is a scalable and modular metagenomics analysis pipeline powered by Nextflow. It supports both short-read and contig-based inputs and automates key analyses such as taxonomic profiling, functional annotation, gene catalog construction, and MAG recovery.

nf-core-metassembly workflow overview

1. Pipeline Summary

The MetaflowX pipeline consists of the following steps:

Quality control ( fastp Trimmomatic Bowtie2)
Contig assembly ( SPAdes MEGAHIT )
Microbial taxonomy and metabolic function analysis ( MetaPhlAn HUMAnN Kraken2 )
Gene catalog construction ( Prodigal CD-HIT eggNOG-mapper antiSMASH BiG-MAP )
MAG binning and evaluation ( MetaBAT2 CONCOCT SemiBin2 MaxBin2 MetaBinner COMEBin binny MetaDecoder Vamb DAS_Tool MAGScoT Checkm2 dRep Galah GTDB-Tk CoverM Deepurify COBRA )
Report generation ( Jinja MultiQC )

For module-level details, see Module Description.

2. Getting Started

2.1 Quick Start

If Nextflow is already installed, you can quickly validate MetaflowX using demo tests:

1. Clone the repository:

bash git clone https://github.com/01life/MetaflowX.git

2. Run either of the following tests:

1️⃣ Test 1: Full pipeline dry run using `stub` mode (no Docker or Conda required)

This test runs the full pipeline structure with small input and stubbed commands (logic is tested, but real computation is skipped).

It requires only Nextflow, no Docker or Conda.

bash nextflow run MetaflowX -stub -profile test_stub --outdir stub_remote

2️⃣ Test 2: Run a single module (`nf-core/fastp` only, requires Docker)

This test runs the built-in nf-core/fastp module with a demo input. It requires Docker.

bash nextflow run MetaflowX -profile test --outdir remote

💡 Both tests finish in a few minutes and produce logs and outputs under the specified --outdir.

[!NOTE] If you want to run the Nextflow pipeline in the background, you can add the -bg option:

bash nextflow run -bg MetaflowX -profile test --outdir remote > remote.out

⚠️ These are functional tests only, not for biological analysis.

2.2 Prerequisites

We recommend preparing all software environments and databases in advance:

See Environment Guide for setting up Conda and required tools.
See Database Guide for downloading and configuring necessary reference data.

2.3 Installation

Clone the repository: bash git clone https://github.com/01life/MetaflowX.git

2.4 Environment & Database Check

[!NOTE] If you are new to Nextflow and nf-core, check the Nextflow installation guide. Ensure your setup passes the -profile test before processing real data.

After installation, validate your full environment using built-in paired-end test data under test/data/:

Sample1_R1.fq.gz, Sample1_R2.fq.gz
Sample2_R1.fq.gz, Sample2_R2.fq.gz

First, prepare an input file samplesheet.csv (see Basic Usage), then run:

bash nextflow run MetaflowX \ -profile <docker/singularity/conda/.../institute> \ --input samplesheet.csv \ --outdir full_test

This run will:

Check tool availability
Verify database paths
Execute major pipeline steps

[!IMPORTANT] ⚠️ Before running, ensure that the following configuration files are properly set:

nextflow.config: general defaults, database paths

conf/modules.config: tool environments and options

conf/base.config: compute resources (CPU, memory)

✅ Match the -profile flag to your local compute environment.

💡 You can use -profile slurm, -profile docker, -profile local, etc.

⏱️ This test may take several minutes depending on system specs.

3. How to run

3.1 Basic usage

Prepare a samplesheet samplesheet.csv with your input data that looks as follows:

csv id,raw_reads1,raw_reads2 S1,/path/to/Sample1_R1.fastq.gz,/path/to/Sample1_R2.fastq.gz S2,/path/to/Sample2_R1.fastq.gz,/path/to/Sample2_R2.fastq.gz

Now, you can run the pipeline using:

bash nextflow run MetaflowX \ -profile <docker/singularity/conda/.../institute> \ --input samplesheet.csv \ --outdir <OUTDIR>

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation. You can use the following command to see all the parameters of the pipeline.

bash nextflow run MetaflowX --help

[!NOTE] MetaflowX relies on plenty of tools and their databases. For detailed installation and configuration instructions, please refer to the dependencies guide, database guide and version documentation.

3.2 Advanced Usage

MetaflowX supports:

Single-end and paired-end reads
Selective module execution using --mode and --skip parameters
Custom database paths and tool options

📖 For a full overview of available parameters and advanced configuration, see the Usage Guide.

📘 For practical examples of common execution modes and corresponding commands, refer to the Execution Guide.

4. Output

The results generated by MetaflowX include the following sections:

✤ Quality control - 01.CleanData/

✤ Contig assembly - 02.Contig

✤ Microbial taxonomy and metabolic function analysis - 101.MetaPhlAn
- 102.HUMAnN

✤ Gene catalog construction - 03.Geneset
- 04.GenesetProfile

✤ Automated binning analysis - 05.BinSet
- 06.BinsetProfile

✤ Report generation - 07.MultiQC - MetaflowXReport*.html

✤ Pipeline information - pipeline_info

See Output Documentation for details.

5. Support

Visit the MetaflowX tutorial for examples and explanations.
Check the Changelog for version history.
Report issues via the GitHub Issues page.

6. Credits

❤️ MetaflowX was developed with support from 01Life. ️

MetaflowX is developed by:

👩‍💻 Yang Ying

👩‍💻 Liang Lifeng

With contributions and feedback from:

👨 Xie Hailiang

👨‍💻 Long Shibin

7. Citations

If you use MetaflowX in your research, please cite:

MetaflowX: A Scalable and Resource-Efficient Workflow for Multi-Strategy Metagenomic Analysis

For all third-party tools used, refer to CITATIONS.md.

Owner

Name: 01life
Login: 01life
Kind: organization

Repositories: 1
Profile: https://github.com/01life

Citation (CITATIONS.md)

# nf-core/metaflowx: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [fastp](https://doi.org/10.1093/bioinformatics/bty560)
- [Trimmomatic](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590)
- [SPAdes](https://doi.org/10.1002/cpbi.102)
- [MEGAHIT](https://pubmed.ncbi.nlm.nih.gov/25609793/)
- [MetaPhlAn](https://www.nature.com/articles/s41587-023-01688-w)
- [HUMAnN](https://elifesciences.org/articles/65088)
- [Kraken2](https://dx.doi.org/10.1186/s13059-019-1891-0)
- [Prodigal](https://doi.org/10.1186/1471-2105-11-119)
- [CD-HIT](https://www.bioinformatics.org/cd-hit/)
- [eggNOG-mapper](https://doi.org/10.1093/molbev/msab293)
- [samtools](https://doi.org/10.1093/gigascience/giab008)
- [DIAMOND](https://doi.org/10.1038/s41592-021-01101-x)
- [RGI](https://www.ncbi.nlm.nih.gov/pubmed/36263822)
- [antiSMASH](https://doi.org/10.1093/nar/gkab335)
- [BiG-MAP](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8547482/)
- [MetaBAT2](https://doi.org/10.7717/peerj.7359)
- [CONCOCT](https://doi.org/10.1038/nmeth.3103)
- [MetaBinner](https://doi.org/10.1186/s13059-022-02832-6)
- [MaxBin2](https://doi.org/10.1093/bioinformatics/btv638)
- [MetaDecoder](https://doi.org/10.1186/s40168-022-01237-8)
- [Vamb](https://doi.org/10.1038/s41587-020-00777-4)
- [SemiBin2](https://doi.org/10.1093/bioinformatics/btad209)
- [COMEBin](https://doi.org/10.1038/s41467-023-44290-z)
- [binny](https://doi.org/10.1101/2021.12.22.473795)
- [MAGScoT](https://doi.org/10.1101/2022.05.17.492251)
- [DAS Tool](https://doi.org/10.1038/s41564-018-0171-1)
- [CheckM2](https://doi.org/10.1038/s41592-023-01940-w)
- [QUAST](https://doi.org/10.1093/bioinformatics/bty266)
- [dRep](https://doi.org/10.1101/108142)
- [galah](https://doi.org/10.1038/s41587-020-00777-4)
- [GTDB-Tk](https://doi.org/10.1093/bioinformatics/btac672)
- [CoverM](https://doi.org/10.1093/bioinformatics/btaf147)
- [Deepurify](https://www.nature.com/articles/s42256-024-00908-5)
- [BWA](https://doi.org/10.48550/arXiv.1303.3997)
- [Mash](https://doi.org/10.1186/s13059-019-1841-x)
- [Bowtie2](https://www.nature.com/articles/nmeth.1923)
- [MultiQC](https://doi.org/10.1093/bioinformatics/btw354)


## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total

Release event: 3
Watch event: 12
Public event: 2
Push event: 40
Pull request event: 25
Create event: 5

Last Year

Release event: 3
Watch event: 12
Public event: 2
Push event: 40
Pull request event: 25
Create event: 5

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

metaflowx

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

MetaflowX User Manual

Contents

1. Pipeline Summary

2. Getting Started

2.1 Quick Start

1️⃣ Test 1: Full pipeline dry run using `stub` mode (no Docker or Conda required)

2️⃣ Test 2: Run a single module (`nf-core/fastp` only, requires Docker)

2.2 Prerequisites

2.3 Installation

2.4 Environment & Database Check

3. How to run

3.1 Basic usage

3.2 Advanced Usage

4. Output

5. Support

6. Credits

7. Citations

Owner

Citation (CITATIONS.md)

GitHub Events

Total

Last Year

Dependencies

metaflowx

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

MetaflowX User Manual

Contents

1. Pipeline Summary

2. Getting Started

2.1 Quick Start

1️⃣ Test 1: Full pipeline dry run using stub mode (no Docker or Conda required)

2️⃣ Test 2: Run a single module (nf-core/fastp only, requires Docker)

2.2 Prerequisites

2.3 Installation

2.4 Environment & Database Check

3. How to run

3.1 Basic usage

3.2 Advanced Usage

4. Output

5. Support

6. Credits

7. Citations

Owner

Citation (CITATIONS.md)

GitHub Events

Total

Last Year

Dependencies

1️⃣ Test 1: Full pipeline dry run using `stub` mode (no Docker or Conda required)

2️⃣ Test 2: Run a single module (`nf-core/fastp` only, requires Docker)