https://github.com/barabasi-lab/grocerydb

Data and Codes for GroceryDB

https://github.com/barabasi-lab/grocerydb

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Data and Codes for GroceryDB

Basic Info
  • Host: GitHub
  • Owner: Barabasi-Lab
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage: https://www.truefood.tech/
  • Size: 54.6 MB
Statistics
  • Stars: 148
  • Watchers: 10
  • Forks: 13
  • Open Issues: 1
  • Releases: 0
Created about 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

What is GroceryDB?

The offering of grocery stores is a strong driver of consumer decisions. While highly processed foods such as packaged products, processed meat and sweetened soft drinks have been increasingly associated with unhealthy diets, information on the degree of processing characterizing an item in a store is not straightforward to obtain, limiting the ability of individuals to make informed choices. GroceryDB, a database with over 50,000 food items sold by Walmart, Target and Whole Foods, shows the degree of processing of food items and potential alternatives in the surrounding food environment. The extensive data gathered on ingredient lists and nutrition facts enables a large-scale analysis of ingredient patterns and degrees of processing, categorized by store, food category and price range. Furthermore, it allows the quantification of the individual contribution of over 1,000 ingredients to ultra-processing. GroceryDB and the associated http://TrueFood.Tech/ website make this information accessible, guiding consumers toward less processed food choices.

Related Publications

Data Files

  • data/GroceryDB Source Data.xlsx → This file includes all the processed data used to generate the figures within the publication at Nature Food.

  • data/GroceryDB_foods.csv → This file includes all the foods in GroceryDB as well as their store, brand, FPro (food processing score), and nutrition facts normalized per 100 grams.

  • data/UpdatedProductsIngredients1115.zip → This is a zipped json file of the disambiguated product ingredient trees used calculate tree features within the Nature Food publication. Schematic trees in the paper are generated from this file.

  • data/GroceryDB_IgFPro.csv → This file includes the IgFPro (ingredient food processing score) that estimates the contribution of over 1,000 ingredients to food ultra-processing.

  • data/GroceryDBtrainingdatasetSRFNDSS20012018NOVA123multicompositions_12Nutrients.csv → This file includes the foods and their manual NOVA labels that we used to train FoodProX and obtain the FPro of products in grocery stores.

  • USDAFDCBFPDApril2021brandedfoodclassifiedFPro12NutPanelmin10_nuts.csv → This file provides FPro (column 'AZ') for foods in USDA BFPD (Global Branded Food Products Database, version April 2021) database that have the minimum of 10 out of 12 mandated nutrients on nutrition fact labels.

  • NHANES20032018FoodSourceConsumed.csv → This file provides the source of food consumed by NHANES participants, capturing the variables DR1FS and DR2FS that corresponds to \Where did you get (this/most of the ingredients for this)?", found at NHANES.

GroceryDB on MongoDB

MongoDB is a NoSQL database that uses a JSON-like format for data storage. All the data scraped from Target, Walmart, and Whole Foods are available on our MongoDB server. We provide both the cleaned data that is used for GroceryDB.

Connecting to MongoDB

You will require two files to connect: query_builder.py and config.json, where the py file contains the functions necessary to establish a connection to MongoDB and the json file contains the necessary keys to successfully connect.

Install required python packages: pymongo, certifi, json, tqdm, and pandas.

After you have the necessary packages and files on your computer, run the jupyter notebook example to load the MongoDB data for downstream use.

Datasets Available

  • CleanedData: contains the FPro scores of all products from Target, Walmart, and Whole Foods. The FPro scores are calculated using a panel of nutrients from each product's provided nutrient table. For each nutrient the reported value and the convert value to g/100g are given. There are two FPro scores given, one with a 12 nutrient panel and another with a 10 nutrient panel. Calories, price per calorie, and price per gram are found in this dataset.

  • ProductIngredients: contains the ingredient list of each product from Target, Walmart, and Whole Foods in the format of an ingredient tree. Each ingredient reports its order in the ingredient list, parent order, depth, and distance to root node. Our disambiguation of ingredient names is given as well as the original name. Ingredients are identified as additive if they are considered additives by the USDA.

Cite GroceryDB

If you find GroceryDB useful in your research, please add the following citation:

@misc{GroceryDB, title={Prevalence of processed foods in major US grocery stores}, author={Babak Ravandi and Gordana Ispirova and Michael Sebek and Peter Mehler and Albert-László Barabási and Giulia Menichetti}, journal={Nature Food} year={2025}, dio={10.1038/s43016-024-01095-7}, url = {https://www.nature.com/articles/s43016-024-01095-7} }

Owner

  • Name: Barabasi Lab
  • Login: Barabasi-Lab
  • Kind: organization
  • Location: Boston, MA

GitHub Events

Total
  • Commit comment event: 1
  • Issues event: 3
  • Watch event: 80
  • Issue comment event: 5
  • Push event: 27
  • Fork event: 9
Last Year
  • Commit comment event: 1
  • Issues event: 3
  • Watch event: 80
  • Issue comment event: 5
  • Push event: 27
  • Fork event: 9

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: 4 days
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 4 days
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • skontopo (1)
  • andreier1 (1)
Pull Request Authors
  • bsilvereagle (1)
Top Labels
Issue Labels
Pull Request Labels