foodnettrends

https://github.com/cdcgov/foodnettrends

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.6%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: CDCgov
License: mit
Language: R
Default Branch: main
Homepage: https://cdcgov.github.io/FoodNetTrends/
Size: 3.28 MB

Statistics

Stars: 1
Watchers: 4
Forks: 2
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed 6 months ago

Metadata Files

Readme Changelog Contributing License Code of conduct Citation

Background

The Foodborne Diseases Active Surveillance Network (FoodNet) monitors illnesses caused by enteric and foodborne pathogens across 10 U.S. sites. FoodNet data is used to track trends in these illnesses and to monitor progress toward federal disease reduction goals.

The original model for analyzing FoodNet data faced limitations, such as sensitivity to single-year aberrations and biases toward more populous sites. To address these issues, this enhanced model (FoodNetTrends) was developed using a Bayesian framework, incorporating thin-plate splines and site-specific interactions.

Key improvements include: - Treating the year as a continuous variable. - Including site-specific trends. - Improved ability to handle uncertainty and noisy data.

User Guide

Getting Started
Preparing Files
Running the Pipeline
Interpreting Output
Advanced Configuration and Optimization

Getting Started

To use the FoodNet enhanced model pipeline, you will need:

R (version 4.3.2) with the brms and tidybayes packages.
Nextflow (version 24.04.2) for running the pipeline.
Access to the FoodNet surveillance dataset.
Singularity or Docker for containerization (Singularity is used in the examples).

For a detailed list of software and dependencies, please refer to the GitHub repository.

We highly recommend using Docker or Singularity containers for full pipeline reproducibility. If these are not possible, Conda is also supported. See the -profile section for more information.

Preparing Files

Data preparation is crucial for accurate modeling. The FoodNet enhanced model requires the following inputs:

MMWR Data File (mmwrFile): The MMWR (Morbidity and Mortality Weekly Report) data file in SAS format (.sas7bdat).
Census Data Files: Two census data files in SAS format:
- Bacterial Census File (censusFile_B)
- Parasitic Census File (censusFile_P)
Parameters:
- Travel Status (travel): A list of travel statuses to include (e.g., "NO,UNKNOWN,YES").
- CIDT Variables (cidt): A list of CIDT (Culture-Independent Diagnostic Tests) variables (e.g., "CIDT+,CX+,PARASITIC").
- Project ID (projID): A unique project identifier (e.g., "20240706").

Ensure that the data files are accessible and properly formatted.

Running the Pipeline

The pipeline is implemented using Nextflow and can be run in two ways:

Required Parameters

When running the pipeline, the following parameters are required:

--outdir: Absolute path to the output directory.
- Example: /path/to/output/results
--mmwrFile: Absolute path to the MMWR data file.
- Example: /path/to/data/mmwr_data.sas7bdat
--censusFile_B: Absolute path to the bacterial census file.
- Example: /path/to/data/census_bacteria.sas7bdat
--censusFile_P: Absolute path to the parasitic census file.
- Example: /path/to/data/census_parasite.sas7bdat
--travel: List of travel statuses to include (comma-separated).
- Example: "NO,UNKNOWN,YES"
--cidt: List of CIDT variables (comma-separated).
- Example: "CIDT+,CX+,PARASITIC"
--projID: Unique project identifier.
- Example: "20240706"

Method 1: Running Directly with Nextflow

You can run the pipeline directly by invoking Nextflow with the required parameters:

```bash module load nextflow singularity conda

nextflow run main.nf \ -entry CDCSPLINE \ -profile singularity,conda \ -with-conda \ -work-dir /path/to/output/work \ --outdir /path/to/output/results \ --mmwrFile "/path/to/data/mmwrdata.sas7bdat" \ --censusFileB "/path/to/data/censusbacteria.sas7bdat" \ --censusFileP "/path/to/data/censusparasite.sas7bdat" \ --travel "NO,UNKNOWN,YES" \ --cidt "CIDT+,CX+,PARASITIC" \ --projID "20240706" ```

Explanation of the command:

Module Loading: Ensure that nextflow, singularity, and conda are loaded in your environment.
-entry CDC_SPLINE: Specifies the entry workflow to run (defined in main.nf).
-profile singularity,conda: Uses the Singularity container and Conda environment profiles.
-with-conda: Enables the use of Conda environments specified in the pipeline.
-work-dir: Specifies the working directory for Nextflow.
Parameter Flags (--): Provide the necessary parameters as described above.

Method 2: Using the `run_workflow.sh` Script

Alternatively, you can use the provided run_workflow.sh script to execute the pipeline.

Update the Script:

Open the run_workflow.sh script.
Update the outDir and dataDir variables with the appropriate paths.
Ensure that the script includes the required parameters (mmwrFile, censusFile_B, censusFile_P, travel, cidt, projID).

Run the Script:

bash bash run_workflow.sh run

The script will set up the environment and execute the pipeline with the specified parameters.

Interpreting Output

After the pipeline completes, you'll find several files and directories in your output folder (--outdir). These include:

SplineResults/: Contains the results of the spline modeling, including .Rds files and plots (.png files).
EstIRRCatch_summary.csv: A summary CSV file combining the estimation results from the spline models.
Logs: Detailed logs of the pipeline execution for troubleshooting.

Output Files and Directories

SplineResults/: Directory containing:
- *.Rds: R data files resulting from the spline modeling.
- *.png: Plots generated from the modeling.
EstIRRCatch_summary.csv: A combined CSV file summarizing the estimation of Incidence Rate Ratios (IRR) by catchment area.

Understanding the Results

Spline Modeling Results: The .Rds files can be loaded into R for further analysis or visualization.
Plots: The .png files provide visual representations of the spline models, trends, and other relevant analyses.
Summary CSV: EstIRRCatch_summary.csv contains aggregated results, which can be opened with any spreadsheet software or analyzed programmatically.

Advanced Configuration and Optimization

Core Nextflow Arguments

`-profile`

Use this parameter to choose a configuration profile. Profiles provide presets for different compute environments.

Available Profiles:

test: Configuration for automated testing (if test data is available).
docker: Uses Docker.
singularity: Uses Singularity.
conda: Uses Conda.

Note:

We recommend using Docker or Singularity for reproducibility.
Multiple profiles can be loaded, e.g., -profile singularity,conda.
If -profile is not specified, the pipeline runs locally, which is not recommended.

`-resume`

Resume a previous pipeline run:

bash nextflow run main.nf -resume

`-c`

Specify a custom Nextflow config file (for resource specifications or infrastructural tweaks):

bash nextflow run main.nf -c /path/to/custom.config

Warning: Do not use -c <file> to specify pipeline parameters. Use it only for resource configurations.

Custom Configuration

Resource Requests

Customize compute resources by adjusting the Nextflow configuration. See the Nextflow documentation for details.

Custom Containers

To use different containers or Conda environments for specific tools, adjust the profiles or configuration files accordingly.

Custom Tool Arguments

If you need to provide additional arguments to the R scripts or other tools within the pipeline, you may need to modify the pipeline scripts (main.nf, spline.nf, trendy.nf) accordingly.

Running in the Background

Run Nextflow in the background:

bash nextflow run main.nf ... -bg

Alternatively, use screen, tmux, or submit Nextflow as a job to your scheduler.

Nextflow Memory Requirements

Limit Nextflow's Java virtual machine memory usage by adding to your environment:

bash export NXF_OPTS='-Xms1g -Xmx4g'

Credits

The FoodNet Trends pipeline was largely developed by Samantha Sevilla and OAMD's SciComp Team with support Daniel Weller (CDC/DFWED/EDEB), based on R scripts developed by Daniel Weller (CDC/DFWED/EDEB) with support from Beau Bruce (CDC/DFWED/EDEB) and Erica Billig Rose (CDC/DFWED/EDEB). Detailed contributions can be found in our user-guides.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

Owner

Name: Centers for Disease Control and Prevention
Login: CDCgov
Kind: organization
Email: data@cdc.gov
Location: Atlanta, GA

Website: http://open.cdc.gov/
Twitter: CDCgov
Repositories: 114
Profile: https://github.com/CDCgov

CDC's collaborative software projects to protect America from health, safety, and security threats, both foreign and in the U.S.

Citation (CITATIONS.md)

# cdc/spline: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total

Watch event: 1
Delete event: 3
Member event: 2
Push event: 277
Pull request event: 7
Fork event: 2
Create event: 10

Last Year

Watch event: 1
Delete event: 3
Member event: 2
Push event: 277
Pull request event: 7
Fork event: 2
Create event: 10

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

slsevilla (4)
jforstedt (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan

modules/nf-core/fastqc/meta.yml cpan

modules/nf-core/multiqc/meta.yml cpan

pyproject.toml pypi

.github/workflows/build_mkdocs.yaml actions

actions/checkout v4 composite
mhausenblas/mkdocs-deploy-gh-pages master composite

docs/requirements.txt pypi

mkdocs-git-revision-date-localized-plugin ==1.2.0
mkdocs-git-revision-date-plugin ==0.3.2
mkdocs-material ==9.1.6
mkdocs-material-extensions ==1.1.1
mkdocs-minify-plugin ==0.6.4

modules/nf-core/custom/dumpsoftwareversions/environment.yml pypi

modules/nf-core/fastqc/environment.yml pypi

modules/nf-core/multiqc/environment.yml pypi