Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary
Repository
Measles Sequence Analysis and Automation
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 6
Metadata Files
README.md
MeaSeq: Measles Sequence Analysis Automation
- Current Updates
- Introduction
- Installation
- Resource Requirements
- Usage
- Outputs
- Steps
- Troubleshooting
- Credits
- Citations
- Contributing
- Legal
Current Updates
2025-09-03
- Illumina and Nanopore workflows fully functional with the same (or equivalent) outputs
- Dependency management fully available with
Docker,Singularity, andConda - Can assign DSIds from reference multi-fasta file and give new N450s a
Novel-hashlabel- With
--dsid_fasta <FASTA> - If no DISd fasta file available, it will assign all N450 as
Novel-hashwith hashes matching if the sequence is the same
- With
Introduction
MeaSeq is a measles virus (MeV) specific pipeline established for use in surveillance and outbreak analysis. This pipeline utilizes a reference-based read mapping approach for Whole Genome or Amplicon sequencing data from both the Illumina and Nanopore platforms to output MeV consensus sequences (whole genome and N450), variant data, sequencing qualtiy information, and custom summary reports.

This project aims to implement an open-source, easy to run, MeV Whole Genome Sequence analysis pipeline that works on both Illumina and Nanopore data. The end goal of this project is to deploy a standardized pipeline focused on final reporting metrics and plots for rapid detection and response to MeV outbreaks in Canada and abroad.
The basis of the pipeline come from two other pipelines. The illumina side from nf-cores' viralrecon pipeline and for nanopore the artic pipeline. Most additions were added for measles-specific QC or reporting.
Installation
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile test_illuminabefore running the workflow on actual data.
Installation requires both nextflow at a minimum version of 24.10.0 and a dependency management system to run.
Steps:
Download and install nextflow
- Download and install with conda
- Conda command:
conda create -n nextflow -c conda-forge -c bioconda nextflow
- Conda command:
- Install with the instructions at https://www.nextflow.io/
- Download and install with conda
Determine which dependency management system works best for you
- Note: Currently the plotting process is using a custom docker container but it should work for both docker and singularity
- Run the pipeline with one of the following profiles to handle dependencies (or use your own profile) if you have one for your institution!:
condamambasingularitydocker
Resources Requirements
By default, the bwamem2 step has a minimum resource usage allocation set to 12 cpus and 72GB memory using the nf-core process_high label.
This can be adjusted (along with the other labels) by creating and passing a custom configuration file with -c <config>. More info can be found in the usage doc
The pipeline has also been test using as low as 2 cpus and 8GB memory with a few throttling steps but functional.
Usage
Illumina
First, prepare a samplesheet with your input data that looks as follows for Illumina paired-end data:
samplesheet.csv:
csv
sample,fastq_1,fastq_2
MeVSample01,/PATH/TO/inputread1_S1_L002_R1_001.fastq.gz,/PATH/TO/inputread1_S1_L002_R2_001.fastq.gz
PosCtrl01,/PATH/TO/inputread2_S1_L003_R1_001.fastq.gz,/PATH/TO/inputread2_S1_L003_R2_001.fastq.gz
Sample3,/PATH/TO/inputread3_S1_L004_R1_001.fastq.gz,/PATH/TO/inputread3_S1_L004_R2_001.fastq.gz
Each row represents a sample and its associated paired-end Illumina read data.
You can then run the pipeline using:
bash
nextflow run phac-nml/measeq \
-profile <docker/singularity/.../institute>
--input <SAMPLESHEET> \
--outdir <OUTDIR> \
--reference <REFERENCE FASTA> \
--platform illumina \
Nanopore
And as follows for nanopore data:
samplesheet.csv
csv
sample,fastq_1,fastq_2
MeVSample01,/PATH/TO/inputread1.fastq.gz,
PosCtrl01,/PATH/TO/inputread2.fastq.gz,
Sample3,/PATH/TO/inputread3.fastq.gz,
Each row represents a sample and its single-end nanopore data.
You can then run the pipeline using:
bash
nextflow run phac-nml/measeq \
--input <SAMPLESHEET> \
--outdir <OUTDIR> \
--reference <REFERENCE FASTA> \
--platform nanopore \
--model <CLAIR3_MODEL> \
-profile <docker/singularity/institute/etc>
Clair3 Models
The Nanopore pipeline utilizes Clair3 to call nanopore variants which requires a model that should be picked based off of the flowcell, pore, translocation speed, and basecalling model.
Some models are built into clair3 and some need to be downloaded. The pre-trained clair3 models are able to be automatically downloaded when running the pipeline using artic get_models and can be specified as a parameter with --model <MODEL>.
Additional or local models can also be used, you just have to provide a path to them and use the --local_model <PATH> parameter instead
Amplicon and Primer Files
Both Illumina and Nanopore support running amplicon data using a primer scheme file. To run amplicon data all you need is a primer bed file where the primers have been mapped to the location in the reference genome used. The parameter being --primer_bed <PRIMER_BED>. An example primer bed file looks as such:
primer.bed
<CHROM> <START> <END> <PRIMER_NAME> <POOL> <DIRECTION>
MH356245.1 1 25 MSV_1_LEFT 1 +
MH356245.1 400 425 MSV_2_LEFT 2 +
MH356245.1 500 525 MSV_1_RIGHT 1 -
MH356245.1 900 925 MSV_2_RIGHT 2 -
To properly pair the primers, make sure that the names match up until the _LEFT or _RIGHT that mark the primer direction in the primer name. You can also use the following direction extensions in pairing:
_LEFTand_RIGHT_Land_R_FORWARDand_REVERSE_Fand_R
Note: The first line in the example file is just to display what each line expects and should not be included when creating a primer bed file
DSIds
While 24 MeV genotypes were initially identified, only 2 have been detected since 2021: B3 and D8. Due to this, the Distinct Sequence Identifier (DSId) system was created to designate a unique 4-digit identifier based on the precise N450 sequence as a sub-genotype nomenclature. The Measles Nucleotide Surveillance database (MeaNS) is the global resource for these measles virus genetic sequences that is maintained by the WHO. N450 sequences can be submitted to the database to generate a distinct sequence identifier (DSId) for each unique sequence.
There is no way to query the current database so a multifasta file with DSId calls is required to match them up locally. If a match is found, the matching DSId is assigned! If no match is found, the distinct sequence is given a Novel-<MD5 HASH> (first 7 characters for now) identifier so that it can be submitted to the database. To do this, use the parameter --dsid_fasta <FASTA>. The fasta file would look as such:
dsid_fasta
```
1931 D8 GTCAGTTCCACATTGGCATCTGAACTCG 2001 D8 GTCAGTTCCACATTGGCATCAGAACTCG 2418 B3 GTCAGTTCCACAGTGGCATCTGAACTCG ```
If this parameter is not given, the DSIds will still be generated as hashes to group up samples in the dsid.tsv and in the final report.
More Run Options
For more detailed running options including adding metadata, adjusting parameters, adding in DSID matches, and more, please refer to the usage docs.
[!WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
Testing
To test the MeaSeq pipeline, and that everything works on your system, a small set of illumina D8 genotype samples have been included from SRA BioProject PRJNA480551 in the test_data/fastqs directory.
To run the pipeline on these samples run the following command:
bash
nextflow run phac-nml/measeq -profile test_illumina,<docker/singularity/institute/etc>
Outputs
The main outputs of the pipeline are the consensus sequences (N450 and Full), the overall.qc.csv summary table, and the MeaSeq_Report.html. The final MeaSeq report gives a summary of the run including sample quality metrics, plots, and any additional information. Detailed pipeline outputs are described within the output docs
Steps
More detailed steps are available in the output docs
Illumina Steps
- Generate Reference and Primer Intermediates
- FastQC
- Illumina Consensus Workflow
- FastP
- BWAMem2
- iVar Trim (Amplicon input only)
- Freebayes
- Process Freebayes VCF
- Make Depth Mask
- Bcftools Consensus (Ambiguous and Consensus variants)
- Nextclade (N450 and Custom datasets, N450 fasta output)
- Samtools depth
- Compare DSId (Optional with
--dsid_fastaparameter) - Make sample QC
- Amplicon Summary Workflow (Amp only data)
- Bedtools Coverage
- Summarize Amplicon Depth
- Summarize Amplicon Completeness
- MultiQC Amplicon Report
- Report Workflow
- Samtools mpileup
- Pysamstats
- Rmarkdown
Nanopore Steps
- Generate Reference and Primer Intermediates
- FastQC
- Nanopore Consensus Workflow
- Artic Get Models
- NanoQ
- Minimap2
- Amplicon
- Artic Align Trim
- Clair3 Pool
- Artic VCF Merge
- Clair3 No Pool (non-amplicon)
- Make Depth Mask
- VCF Filter
- Artic Mask
- Bcftools Norm
- Bcftools Consensus
- Nextclade (N450 and Custom datasets, N450 fasta output)
- Samtools depth
- Compare DSId (Optional with
--dsid_fastaparameter) - Make sample QC
- Amplicon Summary Workflow (Amp only data)
- Bedtools Coverage
- Summarize Amplicon Depth
- Summarize Amplicon Completeness
- MultiQC Amplicon Report
- Report Workflow
- Samtools mpileup
- Pysamstats
- Rmarkdown
Troubleshooting
For troubleshooting, please open an issue or consult the usage docs to see if they have the information you require.
Credits
MeaSeq was originally written as an illumina-focused bash pipeline by McMaster University Co-op student - Ahmed Abdalla and has now been expanded to cover nanopore data along with being fully converted to Nextflow.
For questions please contact either:
- Darian Hole (
darian.hole@phac-aspc.gc.ca) - Molly Pratt (
molly.pratt@phac-aspc.gc.ca)
Citations
A citation for this pipeline will be available soon.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. In addition, references of tools and data used in this pipeline are as follows:
Detailed citations for utilized tools are found in CITATIONS.md
Contributing
Contributions are welcome through creating PRs or Issues
Legal
Copyright 2025 Government of Canada
Licensed under the MIT License (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at:
https://opensource.org/license/mit/
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Owner
- Name: National Microbiology Laboratory
- Login: phac-nml
- Kind: organization
- Website: https://www.nml-lnm.gc.ca/
- Repositories: 50
- Profile: https://github.com/phac-nml
Citation (CITATIONS.md)
# phac-nml/MeaSeq: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools [Artic](https://github.com/artic-network/fieldbioinformatics) [Clair3](https://github.com/HKU-BAL/Clair3) > Zheng, Z.; Li, S.; Su, J.; Leung, A. W.-S.; Lam, T.-W.; Luo, R. Symphonizing Pileup and Full-Alignment for Deep Learning-Based Long-Read Variant Calling. Nature Computational Science 2022, 2 (12), 797–803. https://doi.org/10.1038/s43588-022-00387-x. [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) > Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc [fastp](https://github.com/OpenGene/fastp/) > Chen S. (2023). Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2: e107. https://doi.org/10.1002/imt2.107 [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/) > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824. [iVar](https://www.ncbi.nlm.nih.gov/pubmed/30621750/) > Grubaugh ND, Gangavarapu K, Quick J, Matteson NL, De Jesus JG, Main BJ, Tan AL, Paul LM, Brackney DE, Grewal S, Gurfield N, Van Rompay KKA, Isern S, Michael SF, Coffey LL, Loman NJ, Andersen KG. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019 Jan 8;20(1):8. doi: 10.1186/s13059-018-1618-7. PubMed PMID: 30621750; PubMed Central PMCID: PMC6325816. [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. [R](https://www.R-project.org/) > R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [SAMtools](https://www.ncbi.nlm.nih.gov/pubmed/19505943/) > Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8. PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002. [QUAST](https://www.ncbi.nlm.nih.gov/pubmed/23422339/) > Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013 Apr 15;29(8):1072-5. doi: 10.1093/bioinformatics/btt086. Epub 2013 Feb 19. PubMed PMID: 23422339; PubMed Central PMCID: PMC3624806. [Nextclade](https://clades.nextstrain.org/) > Aksamentov, I., Roemer, C., Hodcroft, E. B., & Neher, R. A., (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67), 3773, https://doi.org/10.21105/joss.03773 [pysamstats](https://github.com/alimanfoo/pysamstats) > Miles A. (2014). pysamstats. Available at https://github.com/alimanfoo/pysamstats [Python](https://github.com/python/) > Python Software Foundation. Python Language Reference, version 3.8. Available at http://www.python.org ## R Packages [data.table](https://CRAN.R-project.org/package=data.table) > Barrett T, Dowle M, Srinivasan A, Gorecki J, Chirico M, Hocking T (2024). _data.table: Extension of `data.frame`_. R package version 1.15.4, https://CRAN.R-project.org/package=data.table [DT](https://CRAN.R-project.org/package=DT) > Xie Y, Cheng J, Tan X (2024). _DT: A Wrapper of the JavaScript Library 'DataTables'_. R package version 0.33, https://CRAN.R-project.org/package=DT [dplyr](https://CRAN.R-project.org/package=dplyr) > Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: AGrammar of Data Manipulation_. R package version 1.1.4, https://CRAN.R-project.org/package=dplyr [flexdashboard](https://CRAN.R-project.org/package=flexdashboard) > Aden-Buie G, Sievert C, Iannone R, Allaire J, Borges B (2023). _flexdashboard: R Markdown Format for Flexible Dashboards_. R package version 0.6.2, https://CRAN.R-project.org/package=flexdashboard [htmltools](https://CRAN.R-project.org/package=htmltools) > Cheng J, Sievert C, Schloerke B, Chang W, Xie Y, Allen J (2024). _htmltools: Tools for HTML_. R package version 0.5.8.1, https://CRAN.R-project.org/package=htmltools [htmlwidgets](https://CRAN.R-project.org/package=htmltools) > Vaidyanathan R, Xie Y, Allaire J, Cheng J, Sievert C, Russell K (2023). _htmlwidgets: HTML Widgets for R_. R package version 1.6.4, https://CRAN.R-project.org/package=htmlwidgets [plotly](https://plotly-r.com) > C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020. https://plotly-r.com [rmarkdown](https://github.com/rstudio/rmarkdown) > Allaire J, Xie Y, Dervieux C, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W, Iannone R (2024). _rmarkdown: Dynamic Documents for R_. R package version 2.27, https://github.com/rstudio/rmarkdown [tidyverse](https://doi.org/10.21105/joss.01686) > Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” _Journal of Open Source Software_, _4_(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686 [knitr](https://yihui.org/knitr/) > Xie Y (2024). _knitr: A General-Purpose Package for Dynamic Report Generation in R_. R package version 1.49, https://yihui.org/knitr/ [stringr](https://CRAN.R-project.org/package=stringr) > Wickham H (2023). _stringr: Simple, Consistent Wrappers for Common String Operations_. R package version 1.5.1, https://CRAN.R-project.org/package=stringr [readr](https://CRAN.R-project.org/package=readr) > Wickham H, Hester J, Bryan J (2024). _readr: Read Rectangular Text Data_. R package version 2.1.5, https://CRAN.R-project.org/package=readr [shidashi](https://CRAN.R-project.org/package=shidashi) > Wang Z (2024). _shidashi: A Shiny Dashboard Template System_. R package version 0.1.6, https://CRAN.R-project.org/package=shidashi
GitHub Events
Total
- Release event: 2
- Delete event: 5
- Issue comment event: 8
- Member event: 1
- Push event: 39
- Pull request review comment event: 19
- Pull request review event: 9
- Pull request event: 16
- Create event: 8
Last Year
- Release event: 2
- Delete event: 5
- Issue comment event: 8
- Member event: 1
- Push event: 39
- Pull request review comment event: 19
- Pull request review event: 9
- Pull request event: 16
- Create event: 8
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 7
- Average time to close issues: N/A
- Average time to close pull requests: 7 days
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.43
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 7
- Average time to close issues: N/A
- Average time to close pull requests: 7 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.43
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- DarianHole (7)