balrog-msr
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: edwardbirdlab
- License: other
- Language: Jupyter Notebook
- Default Branch: main
- Size: 988 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
This pipeline will take the metagenomic short read portions of BALROG-MON and make it its own pipeline. This is still a work in progress. Ignore the rest of this readme it has not been updated, and still corresponds to BALROG-MON.
<!--![MIT License][(https://img.shields.io/badge/style-flat--squared-green.svg?style=flat-square)]-->
<!--
-->
BALROG-MON
Bacterial Antimicrobial Resistance annOtation of Genomes - Metagenomic Oxford Nanopore
About BALROG-MON
BALROG-MON (Bacterial Antimicrobial Resistance annOtation of Genomes - Metagenomic Oxford Nanopore) is a comprehensive high throughput Nextflow pipeline built to utilize Q20+ Oxford Nanopore long-reads for the investigation of bacterial antimicrobial resistance (AMR) and its mobility from metagenomic samples. While AMR characterization is the main goal of BALROG-MON, it also provides subworkflows for many related analyses customizable to users' needs, such as assembly-free annotation, pathogen detection, and metagenomic community analysis of bacteria, viruses, and other microorganisms in samples.
[!NOTE] Updates to BALROG-MON may occur periodically to help continually improve the pipeline. If you have any requests or recommended changes you'd like to see (i.e. usage with other data types), please reach out via email (edwardbirdlab@gmail.com | edwardbird@ksu.edu) or request feature.
If you experience any trouble or find bugs when running BALROG-MON, please report issues or bugs and they will be addressed as soon as possible.
Not the BALROG pipeline you're looking for?
BALROG-MSR: Bacterial Antimicrobial Resistance annOtation of Genomes - Metagenomic Short Read
BALROG-ISO: Bacterial Antimicrobial Resistance annOtation of Genomes - ISOlate whole genomes
Workflow Overview
*See sections below for details on subworkflows
Table of Contents
- Getting Started
- Running BALROG-MON
- Core Steps of Workflow
- Optional Steps of Workflow
- Citations
- License
- Contact Information
Getting Started
Before you get too far along, familiarize yourself with this section to make sure this is the pipeline for you and your equipment and samples can meet the requirements. (Don't worry, there isn't too much to do).
1. What Data Do I Need?
BALROG-MON in its current form expects Q20+ Oxford Nanopore Long Read Metagenomic Sequencing. BALROG-MON can run in "Assembly-Free" mode or assembles a metagenome using metaFlye, allowing for the analysis of low and high coverage metagenomes. BALROG-MON in its standard configuration will require 100GB of RAM.
[!NOTE] If you would like to run BALROG-MON with older, non-Q20+ Nanopore data, feel free to request feature.
2. Dependencies
All dependencies are managed via Docker Containers and hosted on DockerHub. One of the following container runtime software packages will be required:
- Nextflow (>= 23.04.0.5857) - Install Nextflow
- Docker/Singularity/Apptainer - Install Docker - Install Singularity - Install Apptainer
3. Installation
Preferred Method - Download Release
sh
wget https://github.com/edwardbirdlab/BALROG-MON/releases/download/v0.0.0/BALROG-0.0.0.tar.gz
tar -xzf BALROG-0.0.0.tar.gz
Method 2 - Clone Repo
sh
git clone https://github.com/edwardbirdlab/BALROG-MON
4. Creating a Sample Sheet
BALROG-MON takes a CSV (Comma-Seperated-Value) sheet as the input. Note that the "sample" column will be the prefix of all output files for that sample.
Example Format:
sample,path,reference_genome
Sample_Name_1,/absolute/path/to/sample1.fastq.gz,/absolute/path/to/reference_genome_1.fna
Sample_Name_2,/absolute/path/to/sample2.fastq.gz,/absolute/path/to/reference_genome_1.fna
5. Nextflow Configuration
When creating a Nextflow config, ensure a container runtime is enabled (Singularity/Apptainer/Docker). If you are using Slurm, you can use the incuded Beocat Slurm config as a template. Most nf-core configs will also be supported. If you have never created a Nextflow config, or are having issues, reach out to your local administration.
Nextflow Configuration - nf-core configs
6. Pipeline Configuration
If you want to change any parameters of BALROG-MON from its default options, they can be changed using the "nextflow.config" file. Configurable parameters will be outlined in the detailed sections below, as well as in the config file.
Running BALROG-MON
- Running the whole pipeline
sh nextflow run /path/to/edwardbirdlab/BALROG-MON -c /path/to/config.cfg - Generate Multi-QC
sh nextflow run /path/to/edwardbirdlab/BALROG-MON -c /path/to/config.cfg --workflow-opt multiqc
<!-- CORE STEPS OF WORKFLOW -->
Core Steps of Workflow
1. Preprocessing
Trimming & Raw QC
- FastQC : Raw Read
- Porechop
chopper
Parameters- params.chopper_minlen = (defualt = 500)
- params.chopper_averagequality = (defualt = 20)
FastQC : Trimmed Read
Final Read QC
2. Read-Based Identification
Pathogen Detection (Core Step for "Assembly Free" Only)
- Kraken 2 (standard database)
3. Sequence Processing
Assembly
- "Assembly Free"
- Seqtk : Convert fastq to fasta
OR
- "Assembled"
- metaFlye : Metagenomic assembly
- [Kraken 2](https://github.com/DerrickWood/kraken2) (standard database) : Reassign sequence identities
Sequence Processing QC
4. ARG & Mobility Annotation
Plasmer : Plasmid prediction
Parameters- params.plasmerminlen = (defualt = 500)
- params.plasmermaxlen = (defualt = 500000)
5. Binning
<!-- OPTIONAL STEPS OF WORKFLOW -->
Optional Steps of Workflow
1. Preprocessing
Standardize Read Names
- Included Python script (useful if you have long read names)
Remove Human DNA
- minimap2 : Mapping to human genome
- SAMtools : Extracting non-human reads names
- Seqtk : Extract non-human reads
Remove Host DNA
- minimap2 : Mapping to host genome
- SAMtools : Extracting non-host reads names
- Seqtk : Extract non-host reads
2. Read-Based Identification
Pathogen Detection (Optional for "Assembled" Only)
- Kraken 2 (standard database)
Parameters
- report-minimizer-data
- minimum-hit-groups 3
Community Analysis
[!NOTE] BALROG-MON does not create a graphical summary of pathogen detection and community analysis results. However, results are readily compatible for visualization using Pavian.
4. ARG and Mobility Annotation
Multi AMR Annotation
[!NOTE] CARD is run by defualt, however it can be switched to include additional ARG databases by setting params.cardonly = TRUE
Citations
As there is currently no paper associated with BALROG-MON, please cite this Github page. Also, I feel free to contact me (edwardbirdlab@gmail.com | edwardbird@ksu.edu) to let me know!
Many tools are used in this pipeline and its respective options. See 'CITATION.md' for the list of all tools used in this pipeline.
License
Distributed for the USDA ARS under the Public Domain. See LICENSE for more information.
Contact
Edward Bird - - edwardbirdlab@gmail.com | edwardbird@ksu.edu
Owner
- Login: edwardbirdlab
- Kind: user
- Repositories: 1
- Profile: https://github.com/edwardbirdlab
Citation (CITATION.cff)
cff-version: 1.2.0
title: 'BALROG-MON'
message: 'If you use this software, please cite it as below.'
type: software
authors:
- given-names: Edward
family-names: Bird
email: edwardbird@ksu.edu
affiliation: Kansas State University
orcid: 'https://orcid.org/0009-0006-3782-9367'
identifiers:
- type: doi
value: 10.5281/zenodo.11110897
description: Zenodo DOI
repository-code: 'https://github.com/edwardbirdlab/BALROG-MON'
abstract: >-
Bacterial Antimicrobial Resistance annOtation of Genomes - Metagenomic Oxford Nanopore
keywords:
- nextflow
- oxford
- short read
license: PDDL-1.0
GitHub Events
Total
- Push event: 78
- Create event: 2
Last Year
- Push event: 78
- Create event: 2