Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: Comelli-lab
  • License: mit
  • Language: R
  • Default Branch: main
  • Size: 23.4 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

24h-recall-outlier-exploration

Background

Twenty-four-hour dietary recalls are widely used to assess dietary intake but generally necessitate multiple administrations. Longitudinal collection introduces temporal dependencies and complicates data cleaning. Existing approaches are static, largely manual, and use USA- and adult-relevant thresholds, limiting their applicability across ages and populations.

Objective

To establish an all-in-one, automated, systematic software pipeline for detecting, exploring, and evaluating outliers in 24-hour recall data for both cross-sectional and longitudinal contexts.

Methods

We utilized 251 repeated recalls from 126 children aged 8-12 years from the Microbiota, GROWth and Diet study and 15,216 repeated recalls from children and adults in the 2015 Canadian Community Health Survey (CCHS). The pipeline was codified and had three components for outlier detection, exploration, and evaluation. For outlier detection, we digitized the National Cancer Institute (NCI) recommendations and compared values against population data from the CCHS (sex and age specific). Exploration of dietary patterns contributing to outliers was via a decision tree.

Results

The pipeline detected 75 outliers across seven nutrients, with varying counts (energy intake, 17; carbohydrates, 24; fiber, 18; sugars, 24; total fats, 20; proteins, 24; sodium, 16; vitamin C, 10). Age-specific reference values highlighted significant differences between groups. The longitudinal method outperformed the static method, achieving over 77% sensitivity, 97% specificity, and 88% precision. Rule extraction identified dietary patterns linked to outliers (low consumption of red/orange vegetables, fruits, milk, and potatoes, and high consumption of poultry and eggs).

Conclusions

This is the first automated pipeline detecting outliers in longitudinal dietary data and clarifying the basis for their identification. Findings, validated across two datasets, underscore the longitudinal method's superior ability to identify true outliers while minimizing false positives in both children and adults. This universally applicable, scalable method improves dietary data analysis efficiency and reproducibility, enhancing its quality and advancing nutritional research and public health outcomes.

Keywords

data-driven outlier detection, 24-h dietary recall, longitudinal data, child and adult nutrition

Outlier Detection

For outlier detection, we digitized the National Cancer Institute recommendations and compared values against population data from the Canadian Community Health Survey (2015), specified for sex and age groups.

Outlier Exploration The pipeline features a decision tree-based exploration of dietary patterns contributing to outliers.

Outlier Evaluation

The pipeline detected 75 outliers across seven nutrients, with counts varying by nutrient Age-specific reference values highlighted significant differences between groups. The longitudinal method outperformed the static method, achieving over 77% sensitivity, 97% specificity, and 88% precision. Rule extraction identified dietary patterns linked to outliers, such as low consumption of red/orange vegetables, fruits, milk, and potatoes, and high consumption of poultry and eggs.

Installation

To install and run the pipeline, follow these steps:

Clone the repository: git clone https://github.com/Comelli-lab/24h-recall-outlier-exploration.git

Navigate to the project directory: ` cd 24h-recall-outlier-exploration

Install the required dependencies:

Rscript install_packages.R Run code: ``` Rscript cleaningfunctions.R Rscript 1.cchsdataanalysis.R Rscript 2.NCIliterature_digitize.R

```

Usage

Detailed usage instructions and examples can be found in the docs directory of this repository.

Contributing

We welcome contributions from the community. Please refer to the CONTRIBUTING.md file for guidelines on how to contribute.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

We would like to thank the participants of the MiGrowD study and the Canadian Community Health Survey for providing the data used in this research.

Owner

  • Name: Dr. Elena Comelli Lab
  • Login: Comelli-lab
  • Kind: organization
  • Email: elena.comelli@utoronto.ca

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as follows:"
title: "A novel codified data-driven approach for detection and explanation of outliers in longitudinal 24-h dietary recall data from children and adults"
version: "1.0.0"
doi: "10.xxxx/your-doi"  # Replace with actual DOI if available
date-released: "2025-02-04"
authors:
  - family-names: "Massara"
    given-names: "Paraskevi"
  - family-names: "Saab"
    given-names: "Stephanie"
  - family-names: "Asrar"
    given-names: "Arooj"
  - family-names: "Omand"
    given-names: "Jessica"
  - family-names: "Anderson"
    given-names: "Laura N."
  - family-names: "Keown-Stoneman"
    given-names: "Charles D.G."
  - family-names: "Maguire"
    given-names: "Jonathon L."
  - family-names: "Bandsma"
    given-names: "Robert H.J."
  - family-names: "Birken"
    given-names: "Catherine S."
  - family-names: "Comelli"
    given-names: "Elena M."
   
affiliations:
  - name: "Department of Nutritional Sciences, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada"
  - name: "Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada"
  - name: "School of Nutrition, Toronto Metropolitan University, Toronto, ON, Canada"
  - name: "Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada"
  - name: "Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Toronto, ON, Canada"
  - name: "Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada"
  - name: "Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada"
  - name: "Department of Pediatrics, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada"
  - name: "Department of Pediatrics, St. Michael’s Hospital, Toronto, ON, Canada"
  - name: "Joannah and Brian Lawson Center for Child Nutrition, University of Toronto, Toronto, ON, Canada"
  - name: "Translational Medicine Program, Research Institute, Hospital for Sick Children"

license: "MIT"
repository-code: "https://github.com/username/repo-name"  # Replace with your actual GitHub repository URL
keywords:
  - "data-driven"
  - "outlier detection"
  - "24-h dietary recall"
  - "longitudinal data"
  - "child and adult nutrition"

GitHub Events

Total
  • Push event: 7
  • Create event: 2
Last Year
  • Push event: 7
  • Create event: 2