ngsqi-spores
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: CDCgov
- License: mit
- Language: Nextflow
- Default Branch: master
- Size: 1.56 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
:mushroom: SPORES: Simulation, Phylogeny Estimation, Read Optimization, Resistance Mutation Identification and Evaluation, and Sequence Annotation
Introduction
SPORES: Simulation, Phylogeny Estimation, Read Optimization, Resistance Mutation Identification and Evaluation, and Sequence Annotation is a bioinformatics pipeline that performs quality control and preprocessing on empirical, long sequencing reads, incorporates variants of interest into reference genomes, and generates long-read in silico datasets using empirically derived error models and genomes containing variants of interest.
The primary objectives of the SPORES workflow entail:
- Generate long-read in silico datasets based on genome sequences containing variants of interest and empirical long-read error models
- Perform preprocessing and error modeling on empirical long-read datasets
- Verify quality of empirical long reads and simulated in silico datasets
This workflow is being built with Nextflow DSL2 and utilizes docker and singularity containers to modularize the workflow for optimal maintenance and reproducibility.
Pipeline Summary
- Input long-read sequencing data (.fastq) and reference genomes (.fna)
- Perform quality control on sequencing reads using NanoPack tools (
NanoComp,NanoPlot,NanoQC) - Preprocess empirical long-read data by filtering reads based on quality and length (
chopper) - Prepare reference genomes for BWA alignment and variant calling (
NUCmer,bedtools,BWA,SAMtools) - Modify reference genomes to contain variants of interest and simulate long sequencing reads (
SeqIO,NanoSim) - Generate versions report
Usage
Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data. Access test data here and update samplesheet path in assets/ folder.To run the SPORES pipeline minimal test, you will need to add your user-specific credentials for the --ncbiemail and --ncbiapi_key parameters to the profile script located at conf/test.config. You can access test data here.
Once complete, you can run the minimal test with the following command:
nextflow run main.nf -profile test,singularity --outdir <OUTDIR>
Set Up:
First, prepare a samplesheet with your input, empirical long-read data so that it resembles the following:
samplesheet.csv:
csv
sample,fastq
Sample1, assets/data/B20592.fastq.gz
Sample2, assets/data/B21256.fastq.gz
Each row represents a long-read fastq file.
You will also need to prepare a samplesheet for reference genomes and variant annotations of interest to be used in simulation.
reference_samplesheet.csv:
csv
reference,clade,var_id,chrom,pos,var_seq
GCA_016772135.1,1,fks1_hs1,CP060340.1,221636,TACTTGACTTTGTCCTTGAGAGATCCT
GCF_003013715.1,2,fks1_hs1,NC_072812.1,2932580,AGGATCTCTCAAGgacaaagtcaagta
Each row corresponds to the following information:
reference: Reference genome accession from NCBIclade: Clade number associated with Candida auris reference genomevar_id: Label for the given variant of interestchrom: Chromosome corresponding to variant of interest locationpos: Numerical nucleotide position of variant of interest (use 0-based indexing)var_seq: Desired variant sequence of interest to be substituted in the given position
For instructions on creating an NCBI account and obtaining an API key, please visit the National Library of Medicine Support Center.
Running SPORES:
Now, you can run the pipeline using:
bash
nextflow run main.nf \
--input ont_read_samplesheet.csv \
--fastas reference_samplesheet.csv \
--ncbi_email <USER NCBI EMAIL> \
--ncbi_api_key <API KEY> \
--outdir <OUTDIR> \
-profile singularity,cdc
Warning Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files, including those provided by the-cNextflow option, can be used to provide any configuration except for parameters; see docs.
Credits
SPORES was originally written by the Next Generation Sequencing (NGS) Quality Initiative (QI) In Silico Team.
We thank the following groups for their extensive assistance in the development of this pipeline:
- CDC Mycotic Diseases Branch (MDB)
- CDC Office of Advanced Molecular Detection (OAMD)
- CDC Office of Laboratory Systems and Response (OLSR)
- CDC Division of Laboratory Systems (DLS)
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
If you use ngsqi/spores for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
CDCgov GitHub Organization Open Source Project
General disclaimer This repository was created for use by CDC programs to collaborate on public health related projects in support of the CDC mission. GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.
Related documents
- Open Practices
- Rules of Behavior
- Thanks and Acknowledgements
- Disclaimer
- Contribution Notice
- Code of Conduct
Public Domain Standard Notice
This repository constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This repository is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication. All contributions to this repository will be released under the CC0 dedication. By submitting a pull request you are agreeing to comply with this waiver of copyright interest.
License Standard Notice
The repository utilizes code licensed under the terms of the Apache Software License and therefore is licensed under ASL v2 or later.
This source code in this repository is free: you can redistribute it and/or modify it under the terms of the Apache Software License version 2, or (at your option) any later version.
This source code in this repository is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Apache Software License for more details.
You should have received a copy of the Apache Software License along with this program. If not, see http://www.apache.org/licenses/LICENSE-2.0.html
The source code forked from other open source projects will inherit its license.
Privacy Standard Notice
This repository contains only non-sensitive, publicly available data and information. All material and community participation is covered by the Disclaimer and Code of Conduct. For more information about CDC's privacy policy, please visit http://www.cdc.gov/other/privacy.html.
Contributing Standard Notice
Anyone is encouraged to contribute to the repository by forking and submitting a pull request. (If you are new to GitHub, you might start with a basic tutorial.) By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the Apache Software License v2 or later.
All comments, messages, pull requests, and other submissions received through CDC including this GitHub page may be subject to applicable federal law, including but not limited to the Federal Records Act, and may be archived. Learn more at http://www.cdc.gov/other/privacy.html.
Records Management Standard Notice
This repository is not a source of government records, but is a copy to increase collaboration and collaborative potential. All government records will be published through the CDC web site.
Additional Standard Notices
Please refer to CDC's Template Repository for more information about contributing to this repository, public domain notices and disclaimers, and code of conduct.
Owner
- Name: Centers for Disease Control and Prevention
- Login: CDCgov
- Kind: organization
- Email: data@cdc.gov
- Location: Atlanta, GA
- Website: http://open.cdc.gov/
- Twitter: CDCgov
- Repositories: 114
- Profile: https://github.com/CDCgov
CDC's collaborative software projects to protect America from health, safety, and security threats, both foreign and in the U.S.
Citation (CITATIONS.md)
# ngsqi/spores: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Release event: 1
- Watch event: 3
- Delete event: 1
- Member event: 1
- Push event: 5
- Create event: 6
Last Year
- Release event: 1
- Watch event: 3
- Delete event: 1
- Member event: 1
- Push event: 5
- Create event: 6
Dependencies
- mshick/add-pr-comment v2 composite
- actions/checkout v4 composite
- nf-core/setup-nextflow v1 composite
- actions/stale v9 composite
- actions/setup-python v5 composite
- eWaterCycle/setup-singularity v7 composite
- nf-core/setup-nextflow v1 composite
- actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
- actions/setup-python 0a5c61591373683505ea898e09a3ea4f39ef2b9c composite
- peter-evans/create-or-update-comment 71345be0265236311c031f5c7866368bd1eff043 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- nf-core/setup-nextflow v1 composite
- dawidd6/action-download-artifact v3 composite
- marocchino/sticky-pull-request-comment v2 composite
- actions/setup-python v5 composite
- rzr/fediverse-action master composite
- zentered/bluesky-post-action v0.1.0 composite