Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: mahynski
  • License: other
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 33.4 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme License Citation Codeowners

README.md

Authenticating Slovenian Fruits and Vegetables

This repository accompanies the manuscript: Comparing Machine Learning Models to Chemometric Ones to Detect Food Fraud: A Case Study in Slovenian Fruits and Vegetables.

"We present a method for comparing models used to detect food fraud based on stable isotopes and trace element (SITE) levels. Existing modeling procedures generally do not provide an uncertainty estimate on a model’s performance due to variations in the training data or preprocessing procedures. Here, we perform a comparison of performance metrics between end-to-end modeling pipelines, enabling hypothesis testing to reveal when differences are statistically significant. When many models have similar performances, the model with the best performance is not always the best to implement in practice due to their complexity or cost. Statistical comparison helps reveal the net benefit of complex models, enabling a better interpretation of their value. We illustrate our approach on six different fruits and vegetables collected in Slovenia from 2018-2022. Models in this study include state-of-the-art machine learning models for tabular data, such as Random Forests (RF), and modern one-class classifiers, such as DD-SIMCA."

Datasets are also available on HuggingFace: 1. Apple 2. Asparagus 3. Cherry 4. Garlic 5. Persimmon 6. Strawberry

Final models, trained on the complete datasets listed above, are also available on HuggingFace: 1. Apple 2. Asparagus 3. Cherry 4. Garlic 5. Persimmon 6. Strawberry

Citation

Please cite the associated manuscript as follows:

~~~code @article{MahynskiStrojnikShenOgrinc2025, title={Comparing machine learning models to chemometric ones to detect food fraud: A case study in Slovenian fruits and vegetables}, volume={}, journal={Food Chemistry}, author={Mahynski, Nathan A. and Strojnik, Lidija and Shen, Vincent K. and Ogrinc, Nives}, year={2025}, pages={144569}, doi={10.1016/j.foodchem.2025.144569} } ~~~

Installation

To reproduce the calculations performed in this work, first set up the conda environment for this project. ~~~code $ conda env create -f conda-env.yml $ conda activate test-slo $ python -m ipykernel install --user --name=test-slo ~~~

Acknowledgements

The Slovenian Forestry and Food, Administration for Food Safety, Veterinary Sector and Plant Protection under GA no. C2337-18–000044, C2337-19–000033, and C2337-20–000048, C2337-21-000062 and C2337-22-000066 are gratefully acknowledged. Financial support was also provided from the Slovenian Research and Innovation Agency by P1-0143 and the IAEA project “Authenticity of High-Quality Slovenian Food Products Using Advanced Analytical Techniques” (Contract No. 23362).

Certain equipment, instruments, software, or materials, commercial or non-commercial, are identified in this paper to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement of any product or service by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose. Contribution of the National Institute of Standards and Technology, not subject to US Copyright.

Owner

  • Name: Nathan A. Mahynski
  • Login: mahynski
  • Kind: user
  • Location: Gaithersburg, MD
  • Company: NIST

Chemical Engineer at NIST. Interests include: machine learning, nuclear metrology, food science, thermodynamics, tiling, and crystallography.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Please cite this repository as indicated below."
authors:
  - family-names: Mahynski
    given-names: Nathan
    orcid: https://orcid.org/0000-0002-0008-8749
  - family-names: Strojnik
    given-names: Lidija
    orcid: https://orcid.org/0000-0003-1898-9147
  - family-names: Shen
    given-names: Vincent
  - family-names: Ogrinc
    given-names: Nives
    orcid: https://orcid.org/0000-0002-0773-0095
title: "slovenian-authentication"
version: v0.0.1
date-released: 2024-12-10

GitHub Events

Total
  • Delete event: 1
  • Push event: 3
  • Public event: 1
Last Year
  • Delete event: 1
  • Push event: 3
  • Public event: 1