Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
Statistics
  • Stars: 1
  • Watchers: 4
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

Background

The Foodborne Diseases Active Surveillance Network (FoodNet) monitors illnesses caused by enteric and foodborne pathogens across 10 U.S. sites. FoodNet data is used to track trends in these illnesses and to monitor progress toward federal disease reduction goals.

The original model for analyzing FoodNet data faced limitations, such as sensitivity to single-year aberrations and biases toward more populous sites. To address these issues, this enhanced model (FoodNetTrends) was developed using a Bayesian framework, incorporating thin-plate splines and site-specific interactions.

Key improvements include: - Treating the year as a continuous variable. - Including site-specific trends. - Improved ability to handle uncertainty and noisy data.

User Guide

Table of Contents

Getting Started

To use the FoodNet enhanced model pipeline, you will need:

  • R (version 4.3.2) with the brms and tidybayes packages.
  • Nextflow (version 24.04.2) for running the pipeline.
  • Access to the FoodNet surveillance dataset.
  • Singularity or Docker for containerization (Singularity is used in the examples).

For a detailed list of software and dependencies, please refer to the GitHub repository.

We highly recommend using Docker or Singularity containers for full pipeline reproducibility. If these are not possible, Conda is also supported. See the -profile section for more information.

Preparing Files

Data preparation is crucial for accurate modeling. The FoodNet enhanced model requires the following inputs:

  • MMWR Data File (mmwrFile): The MMWR (Morbidity and Mortality Weekly Report) data file in SAS format (.sas7bdat).
  • Census Data Files: Two census data files in SAS format:
    • Bacterial Census File (censusFile_B)
    • Parasitic Census File (censusFile_P)
  • Parameters:
    • Travel Status (travel): A list of travel statuses to include (e.g., "NO,UNKNOWN,YES").
    • CIDT Variables (cidt): A list of CIDT (Culture-Independent Diagnostic Tests) variables (e.g., "CIDT+,CX+,PARASITIC").
    • Project ID (projID): A unique project identifier (e.g., "20240706").

Ensure that the data files are accessible and properly formatted.

Running the Pipeline

The pipeline is implemented using Nextflow and can be run in two ways:

Required Parameters

When running the pipeline, the following parameters are required:

  • --outdir: Absolute path to the output directory.
    • Example: /path/to/output/results
  • --mmwrFile: Absolute path to the MMWR data file.
    • Example: /path/to/data/mmwr_data.sas7bdat
  • --censusFile_B: Absolute path to the bacterial census file.
    • Example: /path/to/data/census_bacteria.sas7bdat
  • --censusFile_P: Absolute path to the parasitic census file.
    • Example: /path/to/data/census_parasite.sas7bdat
  • --travel: List of travel statuses to include (comma-separated).
    • Example: "NO,UNKNOWN,YES"
  • --cidt: List of CIDT variables (comma-separated).
    • Example: "CIDT+,CX+,PARASITIC"
  • --projID: Unique project identifier.
    • Example: "20240706"

Method 1: Running Directly with Nextflow

You can run the pipeline directly by invoking Nextflow with the required parameters:

```bash module load nextflow singularity conda

nextflow run main.nf \ -entry CDCSPLINE \ -profile singularity,conda \ -with-conda \ -work-dir /path/to/output/work \ --outdir /path/to/output/results \ --mmwrFile "/path/to/data/mmwrdata.sas7bdat" \ --censusFileB "/path/to/data/censusbacteria.sas7bdat" \ --censusFileP "/path/to/data/censusparasite.sas7bdat" \ --travel "NO,UNKNOWN,YES" \ --cidt "CIDT+,CX+,PARASITIC" \ --projID "20240706" ```

Explanation of the command:

  • Module Loading: Ensure that nextflow, singularity, and conda are loaded in your environment.
  • -entry CDC_SPLINE: Specifies the entry workflow to run (defined in main.nf).
  • -profile singularity,conda: Uses the Singularity container and Conda environment profiles.
  • -with-conda: Enables the use of Conda environments specified in the pipeline.
  • -work-dir: Specifies the working directory for Nextflow.
  • Parameter Flags (--): Provide the necessary parameters as described above.

Method 2: Using the run_workflow.sh Script

Alternatively, you can use the provided run_workflow.sh script to execute the pipeline.

  1. Update the Script:
  • Open the run_workflow.sh script.
  • Update the outDir and dataDir variables with the appropriate paths.
  • Ensure that the script includes the required parameters (mmwrFile, censusFile_B, censusFile_P, travel, cidt, projID).
  1. Run the Script:

bash bash run_workflow.sh run

The script will set up the environment and execute the pipeline with the specified parameters.

Interpreting Output

After the pipeline completes, you'll find several files and directories in your output folder (--outdir). These include:

  • SplineResults/: Contains the results of the spline modeling, including .Rds files and plots (.png files).
  • EstIRRCatch_summary.csv: A summary CSV file combining the estimation results from the spline models.
  • Logs: Detailed logs of the pipeline execution for troubleshooting.

Output Files and Directories

  • SplineResults/: Directory containing:

    • *.Rds: R data files resulting from the spline modeling.
    • *.png: Plots generated from the modeling.
  • EstIRRCatch_summary.csv: A combined CSV file summarizing the estimation of Incidence Rate Ratios (IRR) by catchment area.

Understanding the Results

  • Spline Modeling Results: The .Rds files can be loaded into R for further analysis or visualization.
  • Plots: The .png files provide visual representations of the spline models, trends, and other relevant analyses.
  • Summary CSV: EstIRRCatch_summary.csv contains aggregated results, which can be opened with any spreadsheet software or analyzed programmatically.

Advanced Configuration and Optimization

Core Nextflow Arguments

-profile

Use this parameter to choose a configuration profile. Profiles provide presets for different compute environments.

Available Profiles:

  • test: Configuration for automated testing (if test data is available).
  • docker: Uses Docker.
  • singularity: Uses Singularity.
  • conda: Uses Conda.

Note:

  • We recommend using Docker or Singularity for reproducibility.
  • Multiple profiles can be loaded, e.g., -profile singularity,conda.
  • If -profile is not specified, the pipeline runs locally, which is not recommended.

-resume

Resume a previous pipeline run:

bash nextflow run main.nf -resume

-c

Specify a custom Nextflow config file (for resource specifications or infrastructural tweaks):

bash nextflow run main.nf -c /path/to/custom.config

Warning: Do not use -c <file> to specify pipeline parameters. Use it only for resource configurations.

Custom Configuration

Resource Requests

Customize compute resources by adjusting the Nextflow configuration. See the Nextflow documentation for details.

Custom Containers

To use different containers or Conda environments for specific tools, adjust the profiles or configuration files accordingly.

Custom Tool Arguments

If you need to provide additional arguments to the R scripts or other tools within the pipeline, you may need to modify the pipeline scripts (main.nf, spline.nf, trendy.nf) accordingly.

Running in the Background

Run Nextflow in the background:

bash nextflow run main.nf ... -bg

Alternatively, use screen, tmux, or submit Nextflow as a job to your scheduler.

Nextflow Memory Requirements

Limit Nextflow's Java virtual machine memory usage by adding to your environment:

bash export NXF_OPTS='-Xms1g -Xmx4g'


Credits

The FoodNet Trends pipeline was largely developed by Samantha Sevilla and OAMD's SciComp Team with support Daniel Weller (CDC/DFWED/EDEB), based on R scripts developed by Daniel Weller (CDC/DFWED/EDEB) with support from Beau Bruce (CDC/DFWED/EDEB) and Erica Billig Rose (CDC/DFWED/EDEB). Detailed contributions can be found in our user-guides.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

Owner

  • Name: Centers for Disease Control and Prevention
  • Login: CDCgov
  • Kind: organization
  • Email: data@cdc.gov
  • Location: Atlanta, GA

CDC's collaborative software projects to protect America from health, safety, and security threats, both foreign and in the U.S.

Citation (CITATIONS.md)

# cdc/spline: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Watch event: 1
  • Delete event: 3
  • Member event: 2
  • Push event: 277
  • Pull request event: 7
  • Fork event: 2
  • Create event: 10
Last Year
  • Watch event: 1
  • Delete event: 3
  • Member event: 2
  • Push event: 277
  • Pull request event: 7
  • Fork event: 2
  • Create event: 10

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • slsevilla (4)
  • jforstedt (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan
modules/nf-core/fastqc/meta.yml cpan
modules/nf-core/multiqc/meta.yml cpan
pyproject.toml pypi
.github/workflows/build_mkdocs.yaml actions
  • actions/checkout v4 composite
  • mhausenblas/mkdocs-deploy-gh-pages master composite
docs/requirements.txt pypi
  • mkdocs-git-revision-date-localized-plugin ==1.2.0
  • mkdocs-git-revision-date-plugin ==0.3.2
  • mkdocs-material ==9.1.6
  • mkdocs-material-extensions ==1.1.1
  • mkdocs-minify-plugin ==0.6.4
modules/nf-core/custom/dumpsoftwareversions/environment.yml pypi
modules/nf-core/fastqc/environment.yml pypi
modules/nf-core/multiqc/environment.yml pypi