eatwell_product_classification
https://github.com/leeds-cdrc/eatwell_product_classification
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Leeds-CDRC
- License: gpl-3.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 12.2 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Eatwell Classification Tool 
Overview:
This tool classifies food items to food group segments of the UK’s EatWell Guide. It is designed to aid automated food group classification for big data sources, such as grocery retailer transaction records.
Version 1.0
This version of the Eatwell classification tool takes product information e.g. (product name, description, shelving categories) and uses the developed text matching algorithms to assign the food product to a segment of the Eatwell Guide. To reflect real-world baskets in addition to the five standard segments defined in the Eatwell guide products can also be classified as an alcoholic beverage, non-alcoholic beverage, discretionary food, composite food, baby/toddler foods, other (e.g. spices and flavouring) or non-food items (i.e. items that may be purchased alongside food items such as kitchen foil, tooth paste etc.). The full category descriptions, logic behind their inclusion and examples are given in Table 1.
|Category |Detail |Example(s)| |---------|-------|--------| |Fruit and Vegetables |Eatwell food category, recommended to be 39% of food consumed (by weight) | Carrots, Apple, Kiwi, Salad | |Potatoes, bread, rice, pasta and other starchy carbohydrates |Eatwell food category, recommended to be 37% of food consumed (by weight) | Wholegrains, Porridge, Cous cous, Cereals | |Beans, pulses, fish, eggs, meat and other proteins|Eatwell food category, recommended to be 12% of food consumed (by weight) | Lentils, Chickpeas, Meat, Fish, Eggs| |Dairy and alternatives|Eatwell food category, recommended to be 8% of food consumed (by weight) |Milk, Cheese, Soya milk | |Oils and spreads|Eatwell food category, recommended to be 1% of food consumed (by weight) |Olive oil, Sunflower spread | |Discretionary Foods |Corresponds to those foods that should be eaten less often and in small amounts (Remaining 3% of foods consumed by weight) |Cakes, Crisps, Biscuits, Chips,| |Alcoholic Beverages | Alcoholic drinks (not included in Eatwell guidance)|Wines, Beers, Spirits | |Non-alcoholic Beverages | Non-alcoholic drinks (not included in Eatwell guidance)- user discretion to include as discretionary where appropriate |Squash, Cordial, Juice, Fizzy drinks| |Composite foods| Foods that are made up of foods in more than one category[^1] |Ready meals, Lasagne, Quiche | |Toddler and baby food | Toddlers and babies have different diary recommendations to the Eatwell Guide therefore are separated out for ease |Formula, baby purees | |Other foods |Food items without a significant nutritional contribution i.e. flavorings, herbs, spices, |Dried herbs and spices, pepper, salt | |Non-food items |Products potentially erroneously included as they are typically purchased alongside a food shop| Kitchen foil, Toothpaste, Homeware|
Table 1.: Overview of the food categories used in the Eatwell Classification Tool
[^1]: The user can decide how to handle these composite foods dependent on the research question being asked, later versions will assist in claucalitng fruit and vegetable portions in these food groups.
How the Algorithm works
The text mining algorithm uses an iteratively developed lexicon to assign the product of interest to one of the extended Eatwell categories outlined in table 1. The algorithm first matches to N number of categories and then uses rules based on expert domain knowledge to assign the final category. Matching justifications are provided and are modifiable by the user for transparency.
E.g. “Eton Mess: Strawberries and Meringue” would match to two categories: Fruit and Vegetables and Discretionary, however as one of the rules is that any product with a Discretionary element is classified as such, therefore the final Eatwell Category assigned would be discretionary.
E.g. “Garden Salad: Lettuce, Tomato, Cucumber” would match four times to the Fruit and Vegetable Eatwell Category so would be assigned to that category and an indication of high probability of correct classification given.
Algorithm Development
Using real world product data, the algorithm has been designed iteratively to capture a wide range of products. To ensure commercial sensitivity brand names are not used to inform classification, however there is the option for users to assign brand items to an Eatwell category to improve business specific classification. The algorithm and underlying database will continue to be updated to further improve product classification.
Caveats
Assumptions on the data may need to be modified dependent on end use It is recommended that all classifications are validated against nutritional information. We have produced interactive visualisations (see notebook___) to assist in visual validation of the data. Check back regularly for code updates
Upcoming .. version (2.0)
- An interactive web dashboard is planned for version 2.0 to allow the use of the Eatwell classification tool without programming experience.
Owner
- Name: Leeds-CDRC
- Login: Leeds-CDRC
- Kind: organization
- Repositories: 2
- Profile: https://github.com/Leeds-CDRC
Citation (citation.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Eatwell Classification Tool
message: >-
Data provided by the Consumer Data Research Centre (CDRC)
an ESRC Data Investment refs ES/L011840/1; ES/L011891/1
type: software
authors:
- given-names: Francesca
family-names: Pontin
email: f.l.pontin@leeds.ac.uk
affiliation: University of Leeds
orcid: 'https://orcid.org/0000-0002-7143-8718'
- name: Consumer Data Research Centre
city: Leeds
identifiers:
- type: doi
value: 10.5281/zenodo.7074554
repository-code: >-
https://github.com/Leeds-CDRC/Eatwell_product_classification/tree/main
abstract: >-
This version of the Eatwell classification tool takes
product information e.g. (product name, description,
shelving categories) and uses the developed text matching
algorithms to assign the food product to a segment of the
Eatwell Guide. To reflect real-world baskets in addition
to the five standard segments defined in the Eatwell guide
products can also be classified as an alcoholic beverage,
non-alcoholic beverage, discretionary food, composite
food, baby/toddler foods, other (e.g. spices and
flavouring) or non-food items (i.e. items that may be
purchased alongside food items such as kitchen foil, tooth
paste etc.).
keywords:
- Eatwell Guide
- Diet & Nutrition
- Supermarket data
license: GPL-3.0
version: '1.0'
date-released: '2022-09-13'
preferred-citation:
type: software
title: "Eatwell Classification Tool"
authors:
- given-names: "Francesca"
- family-names: "Pontin"
- orcid: "https://orcid.org/0000-0002-7143-8718"
doi: "10.5281/zenodo.7074554"
month: 9
year: 2022
version: 1
collection-title: "Eatwell Classification Tool"
url: "https://github.com/Leeds-CDRC/Eatwell_product_classification/tree/main"
GitHub Events
Total
- Push event: 15
Last Year
- Push event: 15