nano-rave
Nextflow pipeline designed for rapid onsite QC and variant calling of Oxford Nanopore data (following basecalling and demultiplexing with Guppy).
Science Score: 62.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 10 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org -
○Academic email domains
-
✓Institutional organization owner
Organization sanger-pathogens has institutional domain (www.sanger.ac.uk) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Repository
Nextflow pipeline designed for rapid onsite QC and variant calling of Oxford Nanopore data (following basecalling and demultiplexing with Guppy).
Basic Info
Statistics
- Stars: 10
- Watchers: 6
- Forks: 4
- Open Issues: 1
- Releases: 1
Metadata Files
README.md
nano-rave
Nextflow pipeline designed for rapid onsite QC and variant calling of Oxford Nanopore data (following basecalling and demultiplexing with Guppy).
Pipeline summary

Getting started
Running on a personal computer
Example:
bash
nextflow run github.com/sanger-pathogens/nano-rave --sequencing_manifest ./test_data/pipeline/inputs/test_manifest.csv --reference_manifest ./test_data/pipeline/inputs/reference_manifest.csv --variant_caller medaka_haploid --min_barcode_dir_size 5 --results_dir my_output
See usage for all available pipeline options.
- Once your run has finished, check output in the
results_dirand clean up any intermediate files. To do this (assuming no other pipelines are running from the current working directory) run:
bash
rm -rf work .nextflow*
Running on the farm (Sanger HPC clusters)
- Add the following line to
~/.bashrc(if not already present):
bash
[[ -f /software/pathogen/farm5 ]] && source /software/pathogen/etc/pathogen.profile
- Source the updated .bashrc file
bash
source ~/.bashrc
Load the module
bash module load nano-rave/<version>The pipeline should now be directly available with the command
nano-ravebash nano-rave --helpbefore excuting
nano-rave, it is recommended to set the$SINGULARITY_CACHEDIRand$NXF_SINGULARITY_CACHEDIRenvironment variables so that they both point to a folder with enough space. This location is that one where singularity images supporting the pipeline dependencies will be downloaded; by default it is downloaded inside your home directory (spcifically in${HOME}/.singularity/cache), which has space limitations and will rapidly fill up, causing the pipeline to fail. On the Sanger HPC, it is recommended to point to a location on your lustre scratch space.Start your analysis
To use the appropriate Sanger configuration, please run with -profile sanger_local option.
Here is an example command:
bash
bsub -o nano-rave.o -e nano-rave.e -q long -n 4 -R "select[mem>16000] rusage[mem=16000]" -M16000 \
nano-rave -profile sanger_local --sequencing_manifest ./test_data/pipeline/inputs/test_manifest.csv --reference_manifest ./test_data/pipeline/inputs/reference_manifest.csv --variant_caller medaka_haploid --min_barcode_dir_size 5 --results_dir my_output
This will run the whole pipeline i.e. all per-sample processes within a single siubmitted job, so please tailor your resource request accordingly.
NB: we are working on providing a sanger_lsf profile that will enable th e proper use of LSF cluster integration, meaning that each process is executed by submitting as a separate job on the HPC; under such configuration, you would be advised to submitted the main job (workflow head process) to the oversubscribed queue.
See usage for all available pipeline options.
- Once your run has finished, check output in the
results_dirand clean up any intermediate files. To do this (assuming no other pipelines are running from the current working directory) run:
bash
rm -rf work .nextflow*
Usage
``` nextflow run main.nf
Options: --sequencingmanifest Manifest containing paths to sequencing directories and sequencing summary files (mandatory) --referencemanifest Manifest containing reference identifiers and paths to fastq reference files (mandatory) --resultsdir Specify results directory default: ./nextflow_results --variantcaller Specify a variant caller to use medaka (default), medaka_haploid, freebayes, clair3 --clair3args Specify clair3 variant calling parameters - must include model e.g. --clair3args "--modelpath /opt/models/r941promsupg5014" (optional) --minbarcodedirsize Specify the expected minimum size of the barcode directories, in MB. Must be > 0. default: 10 --keepbam_files Save BAM files in results directory default: false --help Print this help message (optional) ```
Note:
- Please refer to https://github.com/HKU-BAL/Clair3#usage for a comprehesive list of options that can be used with
--clair3_args. Currently, by default, the software will assume you will want to variant call human chromosome contigs. If this is not the case, or you wish to use a custom set of contigs, please see clair3 options--include_all_ctgsor--ctg_name. You may also want to skip phasing with clair3 option--no_phasing_for_faif this is not required or useful for you.
Sequencing manifest format
The sequencing manifest is in a csv format and contains two columns
sequencing_dir: folder containing all the Oxford Nanopore sequencing datasequence_summary_file: required for QC - usually found in the sequencing directory. In this file, the paths to the fast5 read files (first column) must be full/absolute paths.
The pipeline assumes that sequencing_dir contains Guppy output for a particular sample. In particular, the parent and child folders of the given sequencing_dir assume the following structure:
<sample>/<sequencing_dir>/fastq_pass/barcode*
Where each barcode* directory contains fastq.gz files. Only barcode* directories whose total size on disk exceeds the threshold set with --min_barcode_dir_size are considered.
Example manifest:
sequencing_dir,sequence_summary_file
./test_data/PIPELINE/inputs/sample/sequencing_dir,./test_data/PIPELINE/inputs/sample/sequencing_dir/sequencing_summary.txt
Note: When using relative paths in the manifest, they are relative to the current working directory (from which nextflow is run).
Reference manifest format
The reference manifest is in csv format and contains two columns
reference_id: identifier for the reference (e.g. gene name or reference genome name)reference_path: path to the reference file (fastaformat)
Example for amplicon data:
reference_id,reference_path
ama1,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_ama1.fasta
crt,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_crt.fasta
csp,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_csp.fasta
dhfr,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_dhfr.fasta
dhps,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_dhps.fasta
eba175_3d7,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_eba175_3d7.fasta
k13,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_k13.fasta
mdr1,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_mdr1.fasta
msp1,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_msp1.fasta
msp2,./test_data/PIPELINE/inputs/references/ref_target_gene_cds_seq_msp2.fasta
Note: When using relative paths in the manifest, they are relative to the current working directory (from which nextflow is run).
Variant callers
Three variant callers are currently supported:
medaka: Seemedaka_variantusagemedaka_haploid: Seemedaka_haploid_variantusagefreebayes: Seefreebayesusageclair3: Seerun_clair3.shusage
Software versions
The pipeline makes use of docker images to ensure reproducibility. This version of the pipeline uses the following software dependencies:
| Software | Version | Image URL | |-----------|---------|-------------------------------------------------------| | bedtools | 2.29.2 | quay.io/biocontainers/bedtools:2.29.2--hc088bd40 | | clair3 | 1.0.0 | docker.io/hkubal/clair3@sha256:3c4c6db3bb6118e3156630ee62de8f6afef7f7acc9215199f9b6c1b2e1926cf8 | | freebayes | 1.3.5 | docker.io/gfanz/freebayes@sha256:d32bbce0216754bfc7e01ad6af18e74df3950fb900de69253107dc7bcf4e1351 | | medaka | 1.4.4 | quay.io/biocontainers/medaka:1.4.4--py38h130def00 | | minimap2 | 2.17 | quay.io-biocontainers-minimap2:2.17--hed695b03 | | nanoplot | 1.38.0 | quay.io/biocontainers/nanoplot:1.38.0--pyhdfd78af0 | | pycoqc | 2.5.2 | quay.io/biocontainers/pycoqc:2.5.2--py0 | | samtools | 1.15.1 | quay.io/biocontainers/samtools:1.15.1--h11701150 | | tabix | 1.11 | quay.io/biocontainers/tabix:1.11--hdfd78af_0 |
Contributions and testing
Developer contributions to this pipeline will only be accepted if all pipeline tests pass. To check:
Make your changes.
Download the test data. A utility script is provided:
python3 scripts/download_test_data.py
- Install
nf-test(>=0.7.0) and run the tests:
nf-test test tests/*.nf.test
If running on Sanger HPC cluster, add the option --profile sanger_local.
- Submit a PR.
Citations
If you use this pipeline for your analysis, please cite our paper:
Drug resistance and vaccine target surveillance of Plasmodium falciparum using nanopore sequencing in Ghana
Sophia T. Girgis, Edem Adika, Felix E. Nenyewodey, Dodzi K. Senoo Jnr, Joyce M. Ngoi, Kukua Bandoh, Oliver Lorenz, Guus van de Steeg, Alexandria J. R. Harrott, Sebastian Nsoh, Kim Judge, Richard D. Pearson, Jacob Almagro-Garcia, Samirah Saiid, Solomon Atampah, Enock K. Amoako, Collins M. Morang’a, Victor Asoala, Elrmion S. Adjei, William Burden, William Roberts-Sengier, Eleanor Drury, Megan L. Pierce, Sónia Gonçalves, Gordon A. Awandare, Dominic P. Kwiatkowski, Lucas N. Amenga-Etego & William L. Hamilton
Nature Microbiology 8:2365–2377 (2023); doi: 10.1038/s41564-023-01516-6.
Which was initially released as a pre-print:
Nanopore sequencing for real-time genomic surveillance of Plasmodium falciparum
Sophia T. Girgis, Edem Adika, Felix E. Nenyewodey, Dodzi K. Senoo Jnr, Joyce M. Ngoi, Kukua Bandoh, Oliver Lorenz, Guus van de Steeg, Sebastian Nsoh, Kim Judge, Richard D. Pearson, Jacob Almagro-Garcia, Samirah Saiid, Solomon Atampah, Enock K. Amoako, Collins M. Morang’a, Victor Asoala, Elrmion S. Adjei, William Burden, William Roberts-Sengier, Eleanor Drury, Sónia Gonçalves, Gordon A. Awandare, Dominic P. Kwiatkowski, Lucas N. Amenga-Etego, William L. Hamilton
bioRxiv 2022.12.20.521122; doi: 10.1101/2022.12.20.521122
This pipeline was adapted from the nf-core/nanoseq pipeline.
A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines
Ying Chen, Nadia M. Davidson, Yuk Kei Wan, Harshil Patel, Fei Yao, Hwee Meng Low, Christopher Hendra, Laura Watten, Andre Sim, Chelsea Sawyer, Viktoriia Iakovleva, Puay Leng Lee, Lixia Xin, Hui En Vanessa Ng, Jia Min Loo, Xuewen Ong, Hui Qi Amanda Ng, Jiaxu Wang, Wei Qian Casslynn Koh, Suk Yeah Polly Poon, Dominik Stanojevic, Hoang-Dai Tran, Kok Hao Edwin Lim, Shen Yon Toh, Philip Andrew Ewels, Huck-Hui Ng, N.Gopalakrishna Iyer, Alexandre Thiery, Wee Joo Chng, Leilei Chen, Ramanuj DasGupta, Mile Sikic, Yun-Shen Chan, Boon Ooi Patrick Tan, Yue Wan, Wai Leong Tam, Qiang Yu, Chiea Chuan Khor, Torsten Wüstefeld, Ploy N. Pratanwanich, Michael I. Love, Wee Siong Sho Goh, Sarah B. Ng, Alicia Oshlack, Jonathan Göke, SG-NEx consortium
bioRxiv 610741; doi: 10.1101/610741
A full list of citations for tools used in the pipeline is given in CITATIONS.md
Copyright
Copyright (C) 2022,2023 Genome Research Ltd.
Owner
- Name: Pathogen Informatics, Wellcome Sanger Institute
- Login: sanger-pathogens
- Kind: organization
- Location: Hinxton, Cambs., UK
- Website: http://www.sanger.ac.uk/science/groups/pathogen-informatics
- Repositories: 54
- Profile: https://github.com/sanger-pathogens
Citation (CITATIONS.md)
# nano-rave: Citations ## [nano-rave](https://github.com/sanger-pathogens/nano-rave) > Sophia T. Girgis, Edem Adika, Felix E. Nenyewodey, Dodzi K. Senoo Jnr, Joyce M. Ngoi, Kukua Bandoh, Oliver Lorenz, Guus van de Steeg, Sebastian Nsoh, Kim Judge, Richard D. Pearson, Jacob Almagro-Garcia, Samirah Saiid, Solomon Atampah, Enock K. Amoako, Collins M. Morang’a, Victor Asoala, Elrmion S. Adjei, William Burden, William Roberts-Sengier, Eleanor Drury, Sónia Gonçalves, Gordon A. Awandare, Dominic P. Kwiatkowski, Lucas N. Amenga-Etego, William L. Hamilton. Nanopore sequencing for real-time genomic surveillance of Plasmodium falciparum. [bioRxiv 521122](https://www.biorxiv.org/content/10.1101/2022.12.20.521122v1); doi: [10.1101/521122](https://doi.org/10.1101/2022.12.20.521122) ## [nf-core/nanoseq](https://github.com/nf-core/nanoseq/) > Chen Y, Davidson NM, Wan YK, Patel H, Yao F, Low HM, Hendra C, Watten L, Sim A, Sawyer C, Iakovleva V, Lee PL, Xin L, Ng HEV, Loo JM, Ong X, Ng HQA, Wang J, Koh WQC, Poon SYP, Stanojevic D, Tran H-D, Lim KHE, Toh SY, Ewels PA, Ng H-H, Iyer N.G, Thiery A, Chng WJ, Chen L, DasGupta R, Sikic M, Chan Y-S, Tan BOP, Wan Y, Tam WL, Yu Q, Khor CC, Wüstefeld T, Pratanwanich PN, Love MI, Goh WSS, Ng SB, Oshlack A, Göke J, SG-NEx consortium. A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines. [bioRxiv 610741](https://www.biorxiv.org/content/10.1101/2021.04.21.440736v1). doi: [10.1101/610741](https://doi.org/10.1101/2021.04.21.440736) ## [nf-core](https://www.ncbi.nlm.nih.gov/pubmed/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/) > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824. - [Clair3](https://doi.org/10.1038/s43588-022-00387-x) > Zheng Z, Li S, Su J, Leung A W-S, Lam T-W, Luo R. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nature Computational Science. 2022;2(12):797–803. doi: 10.1038/s43588-022-00387-x. [bioRxiv 474431](https://www.biorxiv.org/content/10.1101/2021.12.29.474431v2) - [Minimap2](https://pubmed.ncbi.nlm.nih.gov/29750242/) > Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191. PMID: 29750242; PMCID: PMC6137996. - [Medaka](https://github.com/nanoporetech/medaka) - [NanoPlot](https://pubmed.ncbi.nlm.nih.gov/29547981/) > De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018 Aug 1;34(15):2666-2669. doi: 10.1093/bioinformatics/bty149. PubMed PMID: 29547981; PubMed Central PMCID: PMC6061794. - [pycoQC](https://doi.org/10.21105/joss.01236) > Leger A, Leonardi T. pycoQC, interactive quality control for Oxford Nanopore Sequencing. Journal of Open Source Software. 2019;4(34):1236. doi: 10.21105/joss.01236 - [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/) > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
GitHub Events
Total
- Issues event: 2
- Issue comment event: 4
- Member event: 1
- Fork event: 3
Last Year
- Issues event: 2
- Issue comment event: 4
- Member event: 1
- Fork event: 3
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: 2 days
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: 2 days
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- phuongvnl (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- bhklab/samtools-1.9.0 latest build
- quay.io/biocontainers/bedtools 2.29.2--hc088bd4_0 build
- quay.io/biocontainers/minimap2 2.17--hed695b0_3 build
- quay.io/biocontainers/sniffles 1.0.12--h8b12597_1 build