foodnettrends
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: CDCgov
- License: mit
- Language: R
- Default Branch: main
- Homepage: https://cdcgov.github.io/FoodNetTrends/
- Size: 3.28 MB
Statistics
- Stars: 1
- Watchers: 4
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Background
The Foodborne Diseases Active Surveillance Network (FoodNet) monitors illnesses caused by enteric and foodborne pathogens across 10 U.S. sites. FoodNet data is used to track trends in these illnesses and to monitor progress toward federal disease reduction goals.
The original model for analyzing FoodNet data faced limitations, such as sensitivity to single-year aberrations and biases toward more populous sites. To address these issues, this enhanced model (FoodNetTrends) was developed using a Bayesian framework, incorporating thin-plate splines and site-specific interactions.
Key improvements include: - Treating the year as a continuous variable. - Including site-specific trends. - Improved ability to handle uncertainty and noisy data.
User Guide
Table of Contents
- Getting Started
- Preparing Files
- Running the Pipeline
- Interpreting Output
- Advanced Configuration and Optimization
Getting Started
To use the FoodNet enhanced model pipeline, you will need:
- R (version 4.3.2) with the
brmsandtidybayespackages. - Nextflow (version 24.04.2) for running the pipeline.
- Access to the FoodNet surveillance dataset.
- Singularity or Docker for containerization (Singularity is used in the examples).
For a detailed list of software and dependencies, please refer to the GitHub repository.
We highly recommend using Docker or Singularity containers for full pipeline reproducibility. If these are not possible, Conda is also supported. See the -profile section for more information.
Preparing Files
Data preparation is crucial for accurate modeling. The FoodNet enhanced model requires the following inputs:
- MMWR Data File (
mmwrFile): The MMWR (Morbidity and Mortality Weekly Report) data file in SAS format (.sas7bdat). - Census Data Files: Two census data files in SAS format:
- Bacterial Census File (
censusFile_B) - Parasitic Census File (
censusFile_P)
- Bacterial Census File (
- Parameters:
- Travel Status (
travel): A list of travel statuses to include (e.g.,"NO,UNKNOWN,YES"). - CIDT Variables (
cidt): A list of CIDT (Culture-Independent Diagnostic Tests) variables (e.g.,"CIDT+,CX+,PARASITIC"). - Project ID (
projID): A unique project identifier (e.g.,"20240706").
- Travel Status (
Ensure that the data files are accessible and properly formatted.
Running the Pipeline
The pipeline is implemented using Nextflow and can be run in two ways:
Required Parameters
When running the pipeline, the following parameters are required:
--outdir: Absolute path to the output directory.- Example:
/path/to/output/results
- Example:
--mmwrFile: Absolute path to the MMWR data file.- Example:
/path/to/data/mmwr_data.sas7bdat
- Example:
--censusFile_B: Absolute path to the bacterial census file.- Example:
/path/to/data/census_bacteria.sas7bdat
- Example:
--censusFile_P: Absolute path to the parasitic census file.- Example:
/path/to/data/census_parasite.sas7bdat
- Example:
--travel: List of travel statuses to include (comma-separated).- Example:
"NO,UNKNOWN,YES"
- Example:
--cidt: List of CIDT variables (comma-separated).- Example:
"CIDT+,CX+,PARASITIC"
- Example:
--projID: Unique project identifier.- Example:
"20240706"
- Example:
Method 1: Running Directly with Nextflow
You can run the pipeline directly by invoking Nextflow with the required parameters:
```bash module load nextflow singularity conda
nextflow run main.nf \ -entry CDCSPLINE \ -profile singularity,conda \ -with-conda \ -work-dir /path/to/output/work \ --outdir /path/to/output/results \ --mmwrFile "/path/to/data/mmwrdata.sas7bdat" \ --censusFileB "/path/to/data/censusbacteria.sas7bdat" \ --censusFileP "/path/to/data/censusparasite.sas7bdat" \ --travel "NO,UNKNOWN,YES" \ --cidt "CIDT+,CX+,PARASITIC" \ --projID "20240706" ```
Explanation of the command:
- Module Loading: Ensure that
nextflow,singularity, andcondaare loaded in your environment. -entry CDC_SPLINE: Specifies the entry workflow to run (defined inmain.nf).-profile singularity,conda: Uses the Singularity container and Conda environment profiles.-with-conda: Enables the use of Conda environments specified in the pipeline.-work-dir: Specifies the working directory for Nextflow.- Parameter Flags (
--): Provide the necessary parameters as described above.
Method 2: Using the run_workflow.sh Script
Alternatively, you can use the provided run_workflow.sh script to execute the pipeline.
- Update the Script:
- Open the
run_workflow.shscript. - Update the
outDiranddataDirvariables with the appropriate paths. - Ensure that the script includes the required parameters (
mmwrFile,censusFile_B,censusFile_P,travel,cidt,projID).
- Run the Script:
bash
bash run_workflow.sh run
The script will set up the environment and execute the pipeline with the specified parameters.
Interpreting Output
After the pipeline completes, you'll find several files and directories in your output folder (--outdir). These include:
- SplineResults/: Contains the results of the spline modeling, including
.Rdsfiles and plots (.pngfiles). - EstIRRCatch_summary.csv: A summary CSV file combining the estimation results from the spline models.
- Logs: Detailed logs of the pipeline execution for troubleshooting.
Output Files and Directories
SplineResults/: Directory containing:*.Rds: R data files resulting from the spline modeling.*.png: Plots generated from the modeling.
EstIRRCatch_summary.csv: A combined CSV file summarizing the estimation of Incidence Rate Ratios (IRR) by catchment area.
Understanding the Results
- Spline Modeling Results: The
.Rdsfiles can be loaded into R for further analysis or visualization. - Plots: The
.pngfiles provide visual representations of the spline models, trends, and other relevant analyses. - Summary CSV:
EstIRRCatch_summary.csvcontains aggregated results, which can be opened with any spreadsheet software or analyzed programmatically.
Advanced Configuration and Optimization
Core Nextflow Arguments
-profile
Use this parameter to choose a configuration profile. Profiles provide presets for different compute environments.
Available Profiles:
test: Configuration for automated testing (if test data is available).docker: Uses Docker.singularity: Uses Singularity.conda: Uses Conda.
Note:
- We recommend using Docker or Singularity for reproducibility.
- Multiple profiles can be loaded, e.g.,
-profile singularity,conda. - If
-profileis not specified, the pipeline runs locally, which is not recommended.
-resume
Resume a previous pipeline run:
bash
nextflow run main.nf -resume
-c
Specify a custom Nextflow config file (for resource specifications or infrastructural tweaks):
bash
nextflow run main.nf -c /path/to/custom.config
Warning: Do not use -c <file> to specify pipeline parameters. Use it only for resource configurations.
Custom Configuration
Resource Requests
Customize compute resources by adjusting the Nextflow configuration. See the Nextflow documentation for details.
Custom Containers
To use different containers or Conda environments for specific tools, adjust the profiles or configuration files accordingly.
Custom Tool Arguments
If you need to provide additional arguments to the R scripts or other tools within the pipeline, you may need to modify the pipeline scripts (main.nf, spline.nf, trendy.nf) accordingly.
Running in the Background
Run Nextflow in the background:
bash
nextflow run main.nf ... -bg
Alternatively, use screen, tmux, or submit Nextflow as a job to your scheduler.
Nextflow Memory Requirements
Limit Nextflow's Java virtual machine memory usage by adding to your environment:
bash
export NXF_OPTS='-Xms1g -Xmx4g'
Credits
The FoodNet Trends pipeline was largely developed by Samantha Sevilla and OAMD's SciComp Team with support Daniel Weller (CDC/DFWED/EDEB), based on R scripts developed by Daniel Weller (CDC/DFWED/EDEB) with support from Beau Bruce (CDC/DFWED/EDEB) and Erica Billig Rose (CDC/DFWED/EDEB). Detailed contributions can be found in our user-guides.
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
Owner
- Name: Centers for Disease Control and Prevention
- Login: CDCgov
- Kind: organization
- Email: data@cdc.gov
- Location: Atlanta, GA
- Website: http://open.cdc.gov/
- Twitter: CDCgov
- Repositories: 114
- Profile: https://github.com/CDCgov
CDC's collaborative software projects to protect America from health, safety, and security threats, both foreign and in the U.S.
Citation (CITATIONS.md)
# cdc/spline: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Watch event: 1
- Delete event: 3
- Member event: 2
- Push event: 277
- Pull request event: 7
- Fork event: 2
- Create event: 10
Last Year
- Watch event: 1
- Delete event: 3
- Member event: 2
- Push event: 277
- Pull request event: 7
- Fork event: 2
- Create event: 10
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- slsevilla (4)
- jforstedt (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v4 composite
- mhausenblas/mkdocs-deploy-gh-pages master composite
- mkdocs-git-revision-date-localized-plugin ==1.2.0
- mkdocs-git-revision-date-plugin ==0.3.2
- mkdocs-material ==9.1.6
- mkdocs-material-extensions ==1.1.1
- mkdocs-minify-plugin ==0.6.4