Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary
Repository
Nf-core based workflow for Nanopore metagenomic reads
Basic Info
- Host: GitHub
- Owner: srusher
- License: mit
- Language: Nextflow
- Default Branch: main
- Size: 2.29 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Meta-ONT Workflow Diagram
Summary
Meta-ONT is a bioinformatics workflow that accepts ONT reads as input and runs them through the following processes:
OPTIONAL: Subsampling (
BBMap)Long read QC
Read Alignment (
minimap2)Alignment-based Classification (
samtools) plus custom scripts
- _OPTIONAL_: Read filtering by taxonomic ID (custom module)
- K-mer based Classification (
Kraken2)
- _OPTIONAL_: Read filtering by taxonomic ID ([`KrakenTools`](https://github.com/jenniferlu717/KrakenTools))
OPTIONAL: K-mer based Taxonomic Distribution Visualization (
Krona)OPTIONAL: Alignment-based and K-mer Classifier Comparison Barplot (custom module)
Assembly with one of 3 tools:
- [`Flye`](https://github.com/fenderglass/Flye)
- [`Megahit`](https://github.com/voutcn/megahit)
- [`Spades`](https://github.com/ablab/spades) - NOTE: Spades assembler should only be used with long reads when performing a hybrid assembly
Assembly QC (
quast)OPTIONAL: Assembly Polishing (
Medaka)OPTIONAL: Polished Assembly QC (
quast)Binning (
Maxbin2)Contig Alignment and Identification (
blast)Contig Translation (Nucl -> AA) and Protein Identification (
blastx)Workflow Summary (
MultiQC)
Setup
This workflow uses assets and depencies native to the CDC's SciComp environment. If you do not have access to the SciComp environment, you can request an account here.
Usage
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
csv
sample,fastq_long
SAMPLE_1,/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/long-reads/sample1-long-read-ont.fastq.gz
SAMPLE_2,/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/long-reads/sample2-long-read-ont.fastq.gz
The top row is the header row ("sample,fastqlong") and should never be altered. Each row below the header, represents a fastq file with a unique identifier in the "sample" column (SAMPLE1 and SAMPLE_2 in the example above). Each fastq file needs to be gzipped/compressed to prevent validation errors from occuring at the initialization of the pipeline
There is an example samplesheet located under the assets folder (assets/samplesheet.csv) that you can view and edit yourself. NOTE If you use this samplesheet, please make a back up copy of it as it will be overwritten each time you pull an updated version of this repository.
Once the samplesheet has been formatted, we can run the workflow using one of the 3 methods methods listed below.
Method 1: Cluster Submission:
The qsub method allows you to submit the job to SciComp's high memory cluster computing nodes for fast performance and load distribution. This is a good "fire and forget" method for new users who aren't as familiar with SciComp's compute environment
Format:
bash
bash ./run_qsub.sh --input "/path/to/samplesheet" --outdir "/path/to/output/directory" "<additional-parameters>"
Example:
bash
bash ./run_qsub.sh --input "assets/samplesheet.csv" --outdir "results/test" "--skip_subsample false --num_subsamples 1000 --skip_kraken2 false"
Method 2: Local Execution:
The local method may be a better option if you are experiencing technical issues with the qsub method. qsub adds additonal layers of complexity to workflow execution, while local simply runs the workflow on your local machine or the host that you're connected to, provided it has sufficient memory/RAM and CPUs to execute the workflow
Format:
bash
bash ./run_local.sh --input "/path/to/samplesheet" --outdir "/path/to/output/directory" "<additional-parameters>"
Example:
bash
bash ./run_local.sh --input "./assets/samplesheet.csv" --outdir "./results/test" "--skip_subsample false --num_subsamples 1000 --skip_kraken2 false"
Method 3: Native Nextflow Execution:
If you are familiar with nextflow and Scicomp's computing environment, you can invoke the nextflow command straight from the terminal. NOTE: if you are using this method you will need to load up a nextflow environment via module load or conda
Format:
bash
nextflow run main.nf -profile singularity,local --input "/path/to/samplesheet" --outdir "/path/to/output/directory" \<additional flags\>
Example:
bash
nextflow run main.nf -profile singularity,local --input "./assets/samplesheet.csv" --outdir "./results/test" --skip_subsample false --num_subsamples 1000 --skip_kraken2 false
Update: Sample barplot from classification comparison module
The following barplot was generated using my new classification comparison module which is intended to plot the top 10 classification hits from alignment-based classification and kraken2 classification. The data used to generate this graph was a 10,000 read subsample from a ZYMO mock community sample:

Parameters
See below for all possible input parameters:
Global Variables:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --metagenomic_sample | boolean | true |
Workflow processes:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --skip_subsample | boolean | true |
| --skip_porechop | boolean | true |
| --skip_metamaps | boolean | true |
| --skip_nonpareil | boolean | true |
| --skip_alignment_based_filtering | boolean | true |
| --skip_fastq_screen | boolean | true |
| --skip_kraken2 | boolean | true |
| --skip_kraken2_parse_reads_by_taxon | boolean | true |
| --skip_kraken2_protein | boolean | true |
| --skip_filter_by_kraken2_protein | boolean | true |
| --skip_assembly | boolean | true |
| --skip_medaka | boolean | true |
| --skip_binning | boolean | true |
| --skip_blast | boolean | true |
BBmap subsampling parameters:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --num_subsamples | integer | 1000 |
Chopper quality trimming parameters:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --chopper_q | integer | 20 |
| --chopper_min_len | integer | 1000 |
| --chopper_max_len | integer | 2147483647 |
Minimap2 parameters:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --minimap2_index | string | '/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/minimap2/index/acastellanineff.mmi' |
| --minimap2_mismatch_penalty | integer | 4 |
| --seq2tax_map | string | "/scicomp/home-pure/rtq0/EMEL-GWA/Projects/LongReadAnalysis/data/kraken-db/bactarchvirfungiamoeba-DB41-mer/seqid2taxid.map" |
| `--taxids| string | "./assets/tax_ids_acanth-verm-naegleria.txt" |
|--mapping_quality` | integer | 10 |
MetaMaps parameters:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --metamaps_db | string | "/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/metamaps/databases/refseqcomplete" |
| `--metamapsthreads| integer | 16 |
|--metamaps_mem` | integer | 120 |
Kraken2 parameters:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --kraken_db_main | string | "/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/kraken-db/bactarchvirfungiamoeba-DB41-mer" |
| `--krakendbprotein` | string | "/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/kraken-db/proteinalaninetRNAligase" |
| --kraken_tax_ids | string | "./assets/taxidsacanth-verm-naegleria.txt" |
| --kraken_custom_params | string | "" |
FastqScreen parameters:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --fastq_screen_conf | string | "./assets/fastq_screen.conf" |
Assembler paramaters:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --assembler | string | 'flye' |
BLAST parameters:
| Parameter | Data Type | Default Value |
|:---------:|:---------:|:-------------:|
| --blast_db | string | "/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/blast/arch-bact-fung-hum-amoebarefseq/arch-bact-fung-hum-amoebarefseq" |
| --blast_evalue | string | "1e-10" |
| --blast_perc_identity | string | "95" |
| --blast_target_seqs | string | "5" |
Credits
Meta-ONT was originally written by Sam Rusher (rtq0@cdc.gov).
Citations
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: Samuel Rusher
- Login: srusher
- Kind: user
- Location: Frankfort, KY
- Company: Bioinformatics Specialist with Leidos
- Website: https://www.linkedin.com/in/samuel-rusher/
- Repositories: 1
- Profile: https://github.com/srusher
Programs with an emphasis on bioinformatics | Experience with C#, Python, Java, SQL, R, HTML, and CSS
Citation (CITATIONS.md)
# scicomp/nfcoreskeleton: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Watch event: 1
- Push event: 11
- Create event: 2
Last Year
- Watch event: 1
- Push event: 11
- Create event: 2
Dependencies
- mshick/add-pr-comment v1 composite
- actions/checkout v3 composite
- nf-core/setup-nextflow v1 composite
- actions/stale v7 composite
- actions/checkout v3 composite
- actions/setup-node v3 composite
- actions/checkout v3 composite
- actions/setup-node v3 composite
- actions/setup-python v4 composite
- actions/upload-artifact v3 composite
- mshick/add-pr-comment v1 composite
- nf-core/setup-nextflow v1 composite
- psf/black stable composite
- dawidd6/action-download-artifact v2 composite
- marocchino/sticky-pull-request-comment v2 composite
- actions/setup-python v4 composite
- rzr/fediverse-action master composite
- zentered/bluesky-post-action v0.0.2 composite