megaisurv-namaste

Nanopore Metagenomic Antibiotic Resistance and Taxonomy Screening

https://github.com/utrechtuniversity/megaisurv-namaste

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary

Keywords

antibiotic-resistance metagenomics nanopore-sequencing snakemake taxonomic-classification
Last synced: 10 months ago · JSON representation ·

Repository

Nanopore Metagenomic Antibiotic Resistance and Taxonomy Screening

Basic Info
  • Host: GitHub
  • Owner: UtrechtUniversity
  • License: bsd-3-clause
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 46.9 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
antibiotic-resistance metagenomics nanopore-sequencing snakemake taxonomic-classification
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

MEGAISurv Namaste :pray:

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. License

Namaste: Nanopore Metagenomics antibiotic Resistance and Taxonomy Screening for the project MEGAISurv:

MEtaGenome-informed Antimicrobial resistance Surveillance: Harnessing long-read sequencing for an analytical, indicator and risk assessment framework.

Index

  1. Workflow description
  2. Project (file) organisation
  3. Licence
  4. Citation

Workflow description

Simple description:

  1. Metagenomic reads are preprocessed using fastplong (version 0.2.2)

  2. High-quality reads are assembled using metaFlye (version 2.9.2)

  3. Antibiotic resistance genes are identified using KMA (version 1.4.2)

  4. Resistance genes are masked using BEDtools (function maskFastaFromBed; version 2.31.1)

  5. Assembled and masked contigs are taxonomically classified using Centrifuger (version 1.0.6)

Microbiota profiling

Taxonomic assignment and quantification

For the taxonomic classification of the metagenomes (also known as microbiota profiling), we are using the metagenomic assemblies generated by Flye and classify them with Centrifuger. As Centrifuger expects reads rather than contigs, the relative abundances need to be manually adjusted. To do this, we use the contig length and depth of coverage as reported in the assembly statistics provided by Flye. File assembly_info.txt. With this we calculate the total number of bases assigned to each taxon and from that we calculate the percentage assigned to each species.

Also, Centrifuger does not report taxon names per read/contig automatically. Instead, it provides the tax IDs as reported in the NCBI taxonomy database. To translate these to species names and complete taxonomic lineages, we use TaxonKit (version 0.18.0) with the NCBI taxdump (ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz) downloaded on 26 February 2025.

The practical implementation of this workflow is described in the Snakefile and is as follows:

  1. Classify contigs using Centrifuger with default parameters and the 'cfr_hpv+sarscov2' database (Which is available on Zenodo)

  2. Attach species and taxon lineage names using TaxonKit

  3. Quantify by combining Centrifuger's output and Flye's assembly statistics in a custom R script. (I.e., for each contig, multiply its length with its depth to represent 'totalbases', then calculate percentages per contig and per taxon based on these totalbases.)

TaxonKit user note

Extra note on using TaxonKit: after downloading the taxdump tarball itself, e.g.:

wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

it can be useful to check if it is complete by comparing the md5 checksum:

```bash

Download MD5 checksum

wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz.md5

Check file integrity

md5sum -c taxdump.tax.gz.md5

Then extract the tarball

tar -xzf taxdump.tar.gz ```

TaxonKit relies on the files: names.dmp, nodes.dmp, delnodes.dmp and merged.dmp. Copy or move them to the .taxonkit directory in your home folder (which should be automatically generated when you install taxdump) to be able to use taxonkit. E.g.:

bash mv *.dmp ~/.taxonkit/

Future ideas

  • Summarise the final output in one neat table

  • Include downstream processing scripts (RMarkdown) for statistical analyses and visualisation

  • Test alternative contig classification databases and tools

  • Filter contigs to minimum length of 2-3x average read length?

  • Write/extend documentation of the whole workflow and interesting findings

  • Calculate per-sample and overall fraction of contigs with ARGs: what is the estimated prevalence of ARGs?

  • Document ARG identification process (currently includes R script to parse KMA and include only hits that cover >=60% of the reference ARG)

Project organisation

bash . ├── CITATION.cff ├── LICENSE ├── README.md ├── Snakefile <- Python-based workflow description ├── bin <- Code and programs used in this project/experiment ├── config <- Configuration of Snakemake workflow ├── data <- All project data, divided in subfolders │   ├── processed <- Final data, used for visualisation (e.g. tables) │   ├── raw <- Raw data, original, should not be modified (e.g. fastq files) │   └── tmp <- Intermediate data, derived from the raw data, but not yet ready for visualisation ├── doc <- Project documentation, notes and experiment records ├── envs <- Conda environments necessary to run the project/experiment ├── log <- Log files from programs └── results <- Figures or reports generated from processed data

Licence

This project is licensed under the terms of the New BSD licence.

Citation

Please cite this project as described in the citation file.

Owner

  • Name: Utrecht University
  • Login: UtrechtUniversity
  • Kind: organization
  • Email: info.rdm@uu.nl
  • Location: Utrecht, The Netherlands

The central place for managing code and software for Utrecht University researchers and employees

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: MEGAISurv-Namaste
message: >-
  Workflow for Nanopore Metagenome antibiotic Resistance and
  Taxonomy Screening
type: software
authors:
  - given-names: Sam
    family-names: Nooij
    email: s.nooij@uu.nl
    affiliation: Utrecht University
    orcid: 'https://orcid.org/0000-0001-5892-5637'
  - given-names: Aldert
    family-names: Zomer
    affiliation: Utrecht University
    orcid: 'https://orcid.org/0000-0002-0758-5190'
  - given-names: Agata
    family-names: Dziegiel
    affiliation: 'Quardram Institute '
    orcid: 'https://orcid.org/0000-0002-6148-1771'
identifiers:
  - type: url
    value: 'https://github.com/UtrechtUniversity/MEGAISurv-Namaste'
    description: GitHub repository
repository-code: 'https://github.com/UtrechtUniversity/MEGAISurv-Namaste'
keywords:
  - nanopore-sequencing
  - metagenomics
  - antibiotic-resistance
license: BSD-3-Clause

GitHub Events

Total
  • Push event: 1
  • Create event: 1
Last Year
  • Push event: 1
  • Create event: 1