megaisurv-namaste
Nanopore Metagenomic Antibiotic Resistance and Taxonomy Screening
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Keywords
Repository
Nanopore Metagenomic Antibiotic Resistance and Taxonomy Screening
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
MEGAISurv Namaste :pray:
Namaste: Nanopore Metagenomics antibiotic Resistance and Taxonomy Screening for the project MEGAISurv:
MEtaGenome-informed Antimicrobial resistance Surveillance: Harnessing long-read sequencing for an analytical, indicator and risk assessment framework.
Index
Workflow description
Simple description:
Metagenomic reads are preprocessed using fastplong (version 0.2.2)
High-quality reads are assembled using metaFlye (version 2.9.2)
Antibiotic resistance genes are identified using KMA (version 1.4.2)
Resistance genes are masked using BEDtools (function
maskFastaFromBed; version 2.31.1)Assembled and masked contigs are taxonomically classified using Centrifuger (version 1.0.6)
Microbiota profiling
Taxonomic assignment and quantification
For the taxonomic classification of the metagenomes (also known as microbiota profiling),
we are using the metagenomic assemblies generated by Flye and classify them with
Centrifuger. As Centrifuger expects reads rather than contigs, the relative
abundances need to be manually adjusted. To do this, we use the contig length
and depth of coverage as reported in the assembly statistics provided by Flye.
File assembly_info.txt. With this we calculate the total number of bases
assigned to each taxon and from that we calculate the percentage assigned to
each species.
Also, Centrifuger does not report taxon names per read/contig automatically. Instead, it provides the tax IDs as reported in the NCBI taxonomy database. To translate these to species names and complete taxonomic lineages, we use TaxonKit (version 0.18.0) with the NCBI taxdump (ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz) downloaded on 26 February 2025.
The practical implementation of this workflow is described in the
Snakefile and is as follows:
Classify contigs using Centrifuger with default parameters and the 'cfr_hpv+sarscov2' database (Which is available on Zenodo)
Attach species and taxon lineage names using TaxonKit
Quantify by combining Centrifuger's output and Flye's assembly statistics in a custom R script. (I.e., for each contig, multiply its length with its depth to represent 'totalbases', then calculate percentages per contig and per taxon based on these totalbases.)
TaxonKit user note
Extra note on using TaxonKit: after downloading the taxdump tarball itself, e.g.:
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
it can be useful to check if it is complete by comparing the md5 checksum:
```bash
Download MD5 checksum
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz.md5
Check file integrity
md5sum -c taxdump.tax.gz.md5
Then extract the tarball
tar -xzf taxdump.tar.gz ```
TaxonKit relies on the files: names.dmp, nodes.dmp, delnodes.dmp and merged.dmp.
Copy or move them to the .taxonkit directory in your home folder (which should be automatically
generated when you install taxdump) to be able to use taxonkit. E.g.:
bash
mv *.dmp ~/.taxonkit/
Future ideas
Summarise the final output in one neat table
Include downstream processing scripts (RMarkdown) for statistical analyses and visualisation
Test alternative contig classification databases and tools
Filter contigs to minimum length of 2-3x average read length?
Write/extend documentation of the whole workflow and interesting findings
Calculate per-sample and overall fraction of contigs with ARGs: what is the estimated prevalence of ARGs?
Document ARG identification process (currently includes R script to parse KMA and include only hits that cover >=60% of the reference ARG)
Project organisation
bash
.
├── CITATION.cff
├── LICENSE
├── README.md
├── Snakefile <- Python-based workflow description
├── bin <- Code and programs used in this project/experiment
├── config <- Configuration of Snakemake workflow
├── data <- All project data, divided in subfolders
│ ├── processed <- Final data, used for visualisation (e.g. tables)
│ ├── raw <- Raw data, original, should not be modified (e.g. fastq files)
│ └── tmp <- Intermediate data, derived from the raw data, but not yet ready for visualisation
├── doc <- Project documentation, notes and experiment records
├── envs <- Conda environments necessary to run the project/experiment
├── log <- Log files from programs
└── results <- Figures or reports generated from processed data
Licence
This project is licensed under the terms of the New BSD licence.
Citation
Please cite this project as described in the citation file.
Owner
- Name: Utrecht University
- Login: UtrechtUniversity
- Kind: organization
- Email: info.rdm@uu.nl
- Location: Utrecht, The Netherlands
- Website: https://www.uu.nl
- Repositories: 85
- Profile: https://github.com/UtrechtUniversity
The central place for managing code and software for Utrecht University researchers and employees
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: MEGAISurv-Namaste
message: >-
Workflow for Nanopore Metagenome antibiotic Resistance and
Taxonomy Screening
type: software
authors:
- given-names: Sam
family-names: Nooij
email: s.nooij@uu.nl
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0001-5892-5637'
- given-names: Aldert
family-names: Zomer
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0002-0758-5190'
- given-names: Agata
family-names: Dziegiel
affiliation: 'Quardram Institute '
orcid: 'https://orcid.org/0000-0002-6148-1771'
identifiers:
- type: url
value: 'https://github.com/UtrechtUniversity/MEGAISurv-Namaste'
description: GitHub repository
repository-code: 'https://github.com/UtrechtUniversity/MEGAISurv-Namaste'
keywords:
- nanopore-sequencing
- metagenomics
- antibiotic-resistance
license: BSD-3-Clause
GitHub Events
Total
- Push event: 1
- Create event: 1
Last Year
- Push event: 1
- Create event: 1