mess
Snakemake pipeline for simulating shotgun metagenomic samples
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Keywords
Repository
Snakemake pipeline for simulating shotgun metagenomic samples
Basic Info
- Host: GitHub
- Owner: metagenlab
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://metagenlab.github.io/MeSS
- Size: 5.92 MB
Statistics
- Stars: 23
- Watchers: 5
- Forks: 3
- Open Issues: 6
- Releases: 11
Topics
Metadata Files
README.md
Metagenomic Sequence Simulator (MeSS)
The Metagenomic Sequence Simulator (MeSS) is a Snakemake pipeline, implemented using Snaketool, for simulating illumina, Oxford Nanopore (ONT) and Pacific Bioscience (PacBio) shotgun metagenomic samples.
:mag: Overview
MeSS takes as input NCBI taxa or local genome assemblies to generate either long (PacBio or ONT) or short (illumina) reads. In addition to reads, MeSS optionally generates bam alignment files and taxonomic + sequence abundances in CAMI format.
```mermaid %%{init: {'theme':'forest'}}%% flowchart LR input["samples.tsv or samples/*.tsv"] --> taxons
subgraph genomedownload["genome download"] dlchoice{download ?} taxons["taxons or accesions"] --> dlchoice dlchoice -->|yes| assemblyfinder dlchoice -->|no| fasta assemblyfinder --> fasta end style genomedownload color:#15161a
input --> distchoice
subgraph communitydesign["**community design**"]
distchoice{draw distribution ?}
distchoice -->|yes| dist["distribution
(lognormal, even)"]
dist --> abundances
distchoice -->|no| reads
distchoice -->|no| bases
distchoice -->|no| abundances
depth["coverage depth"]
reads --> depth
bases --> depth
abundances["abundances
(sequence, taxonomic)"] --> depth
end
style communitydesign color:#15161a
style community_design color:#15161a
fasta --> simulator depth --> simulator
simulator["read simulator (art_illumina, pbsim3...)"] simulator --> bam simulator --> fastq simulator --> CAMI-profile
%% subgraph color fills classDef red fill:#faeaea,color:#fff,stroke:#333; classDef blue fill:#eaecfa,color:#fff,stroke:#333; class genome_download blue
class community_design red ```
:books: Documentation
More details can be found in the documentation
:zap: Quick start
:gear: Installation
- Conda (Miniforge)
sh
conda create -n mess mess
- Docker
sh
docker pull ghcr.io/metagenlab/mess:latest
- From source
sh
git clone https://github.com/metagenlab/MeSS.git
pip install -e MeSS
:pagefacingup: Usage
:arrow_right: Input
Let's simulate two metagenomic samples with the following taxa and read counts in samples.tsv:
| sample | taxon | reads |
| --- | --- | --- |
| sample1 | 487 | 174840 |
| sample1 | 727 | 90679 |
| sample1 | 729 | 13129 |
| sample2 | 28132 | 147863 |
| sample2 | 199 | 147545 |
| sample2 | 729 | 131300 |
:rocket: Command
sh
mess run -i samples.tsv
[!IMPORTANT] Apptainer is the default and recommended dependency deployment method for maximum reproducibility !
If you would like to use conda you can specify
--sdm conda.
:cardindexdividers: Outputs
- Downloaded genomes in
mess_out/assembly_finder/download
sh
┣ 📂GCF_000144405.1
┃ ┗ 📜GCF_000144405.1_ASM14440v1_genomic.fna.gz
┣ 📂GCF_001298465.1
┃ ┗ 📜GCF_001298465.1_ASM129846v1_genomic.fna.gz
┣ 📂GCF_016127215.1
┃ ┗ 📜GCF_016127215.1_ASM1612721v1_genomic.fna.gz
┣ 📂GCF_020736045.1
┃ ┗ 📜GCF_020736045.1_ASM2073604v1_genomic.fna.gz
┣ 📂GCF_022869645.1
┗ 📜GCF_022869645.1_ASM2286964v1_genomic.fna.gz
- Simulated reads in
mess_out/fastq
sh
┣ 📜sample1_R1.fq.gz
┣ 📜sample1_R2.fq.gz
┣ 📜sample2_R1.fq.gz
┗ 📜sample2_R2.fq.gz
[!TIP] By default
messoutputs paired illumina reads with the Hiseq25k error profile. Other outputs, and error profiles are described here and here
:bar_chart: Resources usage
Using samples.tsv, mess runs in under 2min, while using around 1.8GB of physical RAM
| taskid | hash | nativeid | name | status | exit | submit | duration | realtime | %cpu | peakrss | peakvmem | rchar | wchar | | ------- | --------- | --------- | -------- | --------- | ---- | ----------------------- | -------- | -------- | ------ | -------- | --------- | ------ | ------ | | 1 | fe/03c2bc | 62286 | MESS (1) | COMPLETED | 0 | 2024-09-04 12:41:15.820 | 1m 50s | 1m 50s | 111.5% | 1.8 GB | 9 GB | 3.5 GB | 2.4 GB | | 1 | ff/0d03b1 | 73355 | MESS (1) | COMPLETED | 0 | 2024-09-04 12:55:12.903 | 1m 52s | 1m 52s | 112.6% | 1.7 GB | 8.8 GB | 3.5 GB | 2.4 GB | | 1 | 07/d352bf | 83576 | MESS (1) | COMPLETED | 0 | 2024-09-04 12:57:30.600 | 1m 50s | 1m 50s | 113.2% | 1.7 GB | 8.9 GB | 3.5 GB | 2.4 GB |
[!NOTE] Average resources usage measured 3 times with one CPU (using nextflow, excluding dependency deployment time).
More details in the resource usage documentation
:fire: Features
Using phage.tsv
| sample | taxon | cov_sim | | :----- | :----- | :------ | | phage | 347329 | 200 |
:dna: Multi sequencing technology
- Illumina
sh
mess run -i phage.tsv --tech illumina -o mess_out/illumina
seqkit stats --all -T -b mess_out/illumina/fastq/*
| file | numseqs | sumlen | avglen | N50 | Q20(%) | Q30(%) | AvgQual | | :------------- | :------- | :------ | :------ | :-- | :----- | :----- | :------ | | phageR1.fq.gz | 44000 | 6600000 | 150.0 | 150 | 98.01 | 91.67 | 27.81 | | phage_R2.fq.gz | 44000 | 6600000 | 150.0 | 150 | 97.31 | 89.65 | 26.52 |
- Nanopore
sh
mess run -i phage.tsv --tech nanopore -o mess_out/nanopore
seqkit stats --all -T -b mess_out/nanopore/fastq/*
| file | numseqs | sumlen | avg_len | N50 | Q20(%) | Q30(%) | AvgQual | | :---------- | :------- | :------- | :------ | :---- | :----- | :----- | :------ | | phage.fq.gz | 1486 | 13203006 | 8884.9 | 12329 | 73.99 | 62.65 | 13.60 |
- PacBio HiFi
sh
mess run -i phage.tsv -o mess_out/pacbio --tech pacbio --error hifi
seqkit stats --all -T -b mess_out/pacbio/fastq/*
| file | numseqs | sumlen | avg_len | N50 | Q20(%) | Q30(%) | AvgQual | | :---------- | :------- | :------- | :------ | :---- | :----- | :----- | :------ | | phage.fq.gz | 1430 | 12588621 | 8803.2 | 12666 | 99.92 | 99.78 | 40.51 |
[!NOTE] We use pbsim3 to simulate multi-pass CLR reads which are converted to HiFi reads with ccs.
PacBio HiFi reads simulations usually take longer compared to other error profiles.
:o: Circular assemblies
Inspired by readSimulator's approach, mess can shuffle genome start points to get circular genome assemblies.
[!WARNING] All contigs in the fasta will be circularised
- Linear (default,
--rotate 1)
sh
mess run -i phage.tsv -o mess_out/linear
- Circular (
--rotate 3)
sh
mess run -i phage.tsv --rotate 3 -o mess_out/circular
:sos: Help
All command-line options at described here
Citation
Please consider citing MeSS if you use it in your work.
Farid Chaabane, Trestan Pillonel, Claire Bertelli, MeSS and assembly_finder: A toolkit for in silico metagenomic sample generation, Bioinformatics, 2024;, btae760, https://doi.org/10.1093/bioinformatics/btae760
BibTeX
@article{chaabane_mess_2024,
title = {MeSS and assembly_finder: A toolkit for in silico metagenomic sample generation},
issn = {1367-4811},
url = {https://doi.org/10.1093/bioinformatics/btae760},
doi = {10.1093/bioinformatics/btae760},
journal = {Bioinformatics},
author = {Chaabane, Farid and Pillonel, Trestan and Bertelli, Claire},
month = dec,
year = {2024},
pages = {btae760},
}
Owner
- Name: metagenlab
- Login: metagenlab
- Kind: organization
- Repositories: 10
- Profile: https://github.com/metagenlab
Citation (CITATION.cff)
cff-version: 1.2.0
title: "MeSS: simulate short and long read metagenomic samples"
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Farid
family-names: Chaabane
email: farid.chaabane@chuv.ch
orcid: "https://orcid.org/0009-0007-9322-1281"
affiliation: >-
Institute of Microbiology, Lausanne University
Hospital and University of Lausanne, Lausanne,
Switzerland
- given-names: Trestan
family-names: Pillonel
email: trestan.pillonel@chuv.ch
orcid: "https://orcid.org/0000-0002-5725-7929"
affiliation: >-
Institute of Microbiology, Lausanne University
Hospital and University of Lausanne, Lausanne,
Switzerland
- given-names: Claire
family-names: Bertelli
email: claire.bertelli@chuv.ch
orcid: "https://orcid.org/0000-0003-0550-8981"
affiliation: >-
Institute of Microbiology, Lausanne University
Hospital and University of Lausanne, Lausanne,
Switzerland
identifiers:
- type: doi
value: 10.5281/zenodo.13365501
description: zenodo software
repository-code: "https://github.com/metagenlab/MeSS"
url: "https://metagenlab.github.io/MeSS/"
abstract: >-
Snakemake pipeline for simulating shotgun metagenomic samples
license: MIT
preferred-citation:
type: article
authors:
- given-names: Farid
family-names: Chaabane
- given-names: Trestan
family-names: Pillonel
- given-names: Claire
family-names: Bertelli
doi: "10.1093/bioinformatics/btae760"
journal: "Bioinformatics"
title: "MeSS and assembly_finder: A toolkit for in silico metagenomic sample generation"
year: 2024
url: "https://doi.org/10.1093/bioinformatics/btae760"
GitHub Events
Total
- Create event: 11
- Release event: 3
- Issues event: 30
- Watch event: 5
- Delete event: 12
- Issue comment event: 24
- Push event: 64
- Pull request review event: 1
- Pull request review comment event: 3
- Pull request event: 29
- Fork event: 4
Last Year
- Create event: 11
- Release event: 3
- Issues event: 30
- Watch event: 5
- Delete event: 12
- Issue comment event: 24
- Push event: 64
- Pull request review event: 1
- Pull request review comment event: 3
- Pull request event: 29
- Fork event: 4
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 15
- Total pull requests: 21
- Average time to close issues: 4 months
- Average time to close pull requests: 3 days
- Total issue authors: 8
- Total pull request authors: 3
- Average comments per issue: 1.87
- Average comments per pull request: 0.29
- Merged pull requests: 16
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 13
- Pull requests: 14
- Average time to close issues: 13 days
- Average time to close pull requests: 3 days
- Issue authors: 7
- Pull request authors: 2
- Average comments per issue: 1.85
- Average comments per pull request: 0.29
- Merged pull requests: 11
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- teojcryan (4)
- farchaab (4)
- matnguyen (2)
- Rohit-Satyam (1)
- inspirewind (1)
- HSecaira (1)
- seanlu96 (1)
- baptwr (1)
Pull Request Authors
- farchaab (19)
- teojcryan (1)
- CarraraAlessia (1)