megaisurv-namaste

Nanopore Metagenomic Antibiotic Resistance and Taxonomy Screening

https://github.com/utrechtuniversity/megaisurv-namaste

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Keywords

antibiotic-resistance metagenomics nanopore-sequencing snakemake taxonomic-classification

Last synced: 11 months ago · JSON representation ·

Repository

Nanopore Metagenomic Antibiotic Resistance and Taxonomy Screening

Basic Info

Host: GitHub
Owner: UtrechtUniversity
License: bsd-3-clause
Language: R
Default Branch: main
Homepage:
Size: 46.9 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Topics

antibiotic-resistance metagenomics nanopore-sequencing snakemake taxonomic-classification

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

MEGAISurv Namaste :pray:

Namaste: Nanopore Metagenomics antibiotic Resistance and Taxonomy Screening for the project MEGAISurv:

MEtaGenome-informed Antimicrobial resistance Surveillance: Harnessing long-read sequencing for an analytical, indicator and risk assessment framework.

Index

Workflow description
- microbiota profiling
- future ideas
Project (file) organisation
Licence
Citation

Workflow description

Simple description:

Metagenomic reads are preprocessed using fastplong (version 0.2.2)
High-quality reads are assembled using metaFlye (version 2.9.2)
Antibiotic resistance genes are identified using KMA (version 1.4.2)
Resistance genes are masked using BEDtools (function maskFastaFromBed; version 2.31.1)
Assembled and masked contigs are taxonomically classified using Centrifuger (version 1.0.6)

Microbiota profiling

Taxonomic assignment and quantification

For the taxonomic classification of the metagenomes (also known as microbiota profiling), we are using the metagenomic assemblies generated by Flye and classify them with Centrifuger. As Centrifuger expects reads rather than contigs, the relative abundances need to be manually adjusted. To do this, we use the contig length and depth of coverage as reported in the assembly statistics provided by Flye. File assembly_info.txt. With this we calculate the total number of bases assigned to each taxon and from that we calculate the percentage assigned to each species.

Also, Centrifuger does not report taxon names per read/contig automatically. Instead, it provides the tax IDs as reported in the NCBI taxonomy database. To translate these to species names and complete taxonomic lineages, we use TaxonKit (version 0.18.0) with the NCBI taxdump (ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz) downloaded on 26 February 2025.

The practical implementation of this workflow is described in the Snakefile and is as follows:

Classify contigs using Centrifuger with default parameters and the 'cfr_hpv+sarscov2' database (Which is available on Zenodo)
Attach species and taxon lineage names using TaxonKit
Quantify by combining Centrifuger's output and Flye's assembly statistics in a custom R script. (I.e., for each contig, multiply its length with its depth to represent 'totalbases', then calculate percentages per contig and per taxon based on these totalbases.)

TaxonKit user note

Extra note on using TaxonKit: after downloading the taxdump tarball itself, e.g.:

wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

it can be useful to check if it is complete by comparing the md5 checksum:

```bash

Download MD5 checksum

wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz.md5

Check file integrity

md5sum -c taxdump.tax.gz.md5

Then extract the tarball

tar -xzf taxdump.tar.gz ```

TaxonKit relies on the files: names.dmp, nodes.dmp, delnodes.dmp and merged.dmp. Copy or move them to the .taxonkit directory in your home folder (which should be automatically generated when you install taxdump) to be able to use taxonkit. E.g.:

bash mv *.dmp ~/.taxonkit/

Future ideas

Summarise the final output in one neat table
Include downstream processing scripts (RMarkdown) for statistical analyses and visualisation
Test alternative contig classification databases and tools
Filter contigs to minimum length of 2-3x average read length?
Write/extend documentation of the whole workflow and interesting findings
Calculate per-sample and overall fraction of contigs with ARGs: what is the estimated prevalence of ARGs?
Document ARG identification process (currently includes R script to parse KMA and include only hits that cover >=60% of the reference ARG)

Project organisation

bash . ├── CITATION.cff ├── LICENSE ├── README.md ├── Snakefile <- Python-based workflow description ├── bin <- Code and programs used in this project/experiment ├── config <- Configuration of Snakemake workflow ├── data <- All project data, divided in subfolders │ ├── processed <- Final data, used for visualisation (e.g. tables) │ ├── raw <- Raw data, original, should not be modified (e.g. fastq files) │ └── tmp <- Intermediate data, derived from the raw data, but not yet ready for visualisation ├── doc <- Project documentation, notes and experiment records ├── envs <- Conda environments necessary to run the project/experiment ├── log <- Log files from programs └── results <- Figures or reports generated from processed data

Licence

This project is licensed under the terms of the New BSD licence.

Citation

Please cite this project as described in the citation file.

Owner

Name: Utrecht University
Login: UtrechtUniversity
Kind: organization
Email: info.rdm@uu.nl
Location: Utrecht, The Netherlands

Website: https://www.uu.nl
Repositories: 85
Profile: https://github.com/UtrechtUniversity

The central place for managing code and software for Utrecht University researchers and employees

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: MEGAISurv-Namaste
message: >-
  Workflow for Nanopore Metagenome antibiotic Resistance and
  Taxonomy Screening
type: software
authors:
  - given-names: Sam
    family-names: Nooij
    email: s.nooij@uu.nl
    affiliation: Utrecht University
    orcid: 'https://orcid.org/0000-0001-5892-5637'
  - given-names: Aldert
    family-names: Zomer
    affiliation: Utrecht University
    orcid: 'https://orcid.org/0000-0002-0758-5190'
  - given-names: Agata
    family-names: Dziegiel
    affiliation: 'Quardram Institute '
    orcid: 'https://orcid.org/0000-0002-6148-1771'
identifiers:
  - type: url
    value: 'https://github.com/UtrechtUniversity/MEGAISurv-Namaste'
    description: GitHub repository
repository-code: 'https://github.com/UtrechtUniversity/MEGAISurv-Namaste'
keywords:
  - nanopore-sequencing
  - metagenomics
  - antibiotic-resistance
license: BSD-3-Clause

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science