validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools

validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools - Published in JOSS (2025)

https://github.com/joshschramm94/validatehot

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation ·

Repository

🎯 Validate your Holdout Task and other tools for preference measurement techniques

Basic Info
  • Host: GitHub
  • Owner: JoshSchramm94
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 3.53 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 2
  • Open Issues: 0
  • Releases: 2
Created almost 3 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.Rmd

---
output: github_document
editor_options: 
  markdown: 
    wrap: 72
bibliography: vignettes/references.bib
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# validateHOT 🎯



[![R-CMD-check](https://github.com/JoshSchramm94/validateHOT/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/JoshSchramm94/validateHOT/actions/workflows/R-CMD-check.yaml)

[![license](https://img.shields.io/badge/license-GPL--3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0.en.html)



validateHOT is a package for preference measurement techniques. It provides
functions to evaluate validation tasks, perform market simulations, and
convert raw utility estimates into scores that are easier to interpret.
All three components are key functions for preference measurement
techniques such as choice-based conjoint (CBC), adaptive choice-based
conjoint (ACBC), or Maximum Difference Scaling (MaxDiff). This package
is particularly relevant for the [Sawtooth
Software](https://sawtoothsoftware.com/) community who would like to
report their analysis in *R* for open science purposes. In addition, it
is compatible with other packages, for example, the ChoiceModelR package
[@ChoiceModelR]. Further, the validateHOT package is valuable for
practitioners, who would like to conduct the analyses using an open-source
software.

Researchers and practitioners use preference measurement techniques for
various purposes, such as calculating the importance of specific
attributes or simulating markets [@gilbride2008; @steiner2018]. The
ultimate goal is to predict future behavior [@green1990]. To ensure
valid and reliable results, it is crucial that the collected data is
valid and can predict outcomes for tasks that were not included in the
estimation of the utility scores. Including validation tasks is highly
recommended [@Orme2015; @rao2014]. They do not only verify data validity
but can also help to test different models. The validateHOT package
offers helpful tools, to facilitate these functions (i.e., validate a
validation task, perform market simulations, and communicate results of
preference measurement techniques) - all within an open-source tool.

> The validateHOT package was primarily developed for use with Sawtooth
> Software [@sawtooth2024] and the ChoiceModelR package [@ChoiceModelR]. Please be cautious about using
> it with other platforms (especially for linear and piecewise-coded
> variables).

👉🏾 **What you need to provide**: 
After collecting your data and running your initial hierarchical Bayes model, you need to import your raw utility scores. If you plan to validate a validation task you also need to provide the actual choice made in this task. We provide a short tutorial in this markdown. For a more comprehensive tutorial, please see the vignette provided with the validateHOT package (`vignette("validateHOT", package = "validateHOT")`). 👈🏾 **What you get**:
The validateHOT package currently provides functions for four key components: - validation metrics - metrics commonly reported in machine learning (i.e., confusion matrix) - simulation methods, such as determining optimal product combinations - converting raw logit utilities into more interpretable scores For the first three components, the `create_hot()` function is essential. This function calculates the total utilities for each alternative in the validation task or the market scenario to be tested. The `create_hot()` function computes the total utility of each alternative based on the additive utility model [@rao2014, p. 82]. ### Classical validation metrics - `hitrate()`: calculates the hit rate (correctly predicted choices) of the validation task. - `kl()`: Kullback-Leibler-Divergence calculates the divergence between the actual choice distribution and the predicted choice distribution [@ding2011; @philentropy]. Due to the asymmetry of the Kullback-Leibler divergence, the output includes divergence both from predicted to observed and from observed to predicted. The validateHOT package currently provides two logarithm bases: $log$ and $log_2$. - `mae()`: calculates the mean absolute error, i.e., the deviation between predicted and stated choice shares - `medae()`: calculates the median absolute error - `mhp()`: calculates the averaged hit probability of participant's actual choice in the validation task - `rmse()`: calculates the root mean square error of deviation between predicted and stated choice shares All functions can be extended with the `group` argument to get the output split by group(s). ### Confusion Matrix The validateHOT package includes metrics from machine learning, i.e., the confusion matrix (e.g., @burger2018). For all of the 5 provided functions, a **none** alternative has to be included in the validation task. The logic of the implemented confusion matrix is to test, for example, whether a buy or no-buy was correctly predicted. Information could be used to get a sense of overestimation and underestimation of general product demand. In the table below `TP` stands for true positives, `FP` for false positives, `TN` for true negatives, and `FN` for false negatives [@burger2018; @kuhn2008]. To translate this to the logic of the validateHOT package, imagine you have a validation task with five alternatives plus the alternative of not buying. The validateHOT package now calculates whether a buy (participant opts for one of the five alternatives) or a no-buy (participant opts for the none alternative), respectively, is correctly predicted. Please be aware that the validateHOT package applies the following coding of the *buy* and *no-buy* alternatives. Rows refer to the observed decisions while columns refer to the predicted ones. | | Buy | No-buy | |--------|:---:|:------:| | Buy | TP | FN | | No-buy | FP | TN | - `accuracy()`: calculates the number of correctly predicted choices (buy or no-buy); $\frac{TP + TN}{TP + TN + FP + FN}$ [@burger2018] - `f1()`: defined as $\frac{2 * precision * recall}{precision + recall}$ or stated differently by @burger2018 $\frac{2TP}{2TP + FP + FN}$ - `precision()`: defined as $\frac{TP}{TP + FP}$ [@burger2018] - `recall()`: defined as $\frac{TP}{TP + FN}$ [@burger2018] - `specificity()`: defined as $\frac{TN}{TN + FP}$ [@burger2018] Again, all functions can be extended with the `group` argument to get the output split by group(s). ### Simulation Methods - `turf()`: **T**(otal) **U**(nduplicated) **R**(each) and **F**(requency) is a "product line extension model" [@miaoulis1990; p. 29] that helps to find the perfect product bundle based on the reach (e.g., how many participants consider buying at least one product of that assortment) and the frequency (how many products on average are a purchase option). `turf()` currently provides both the *threshold* approach (`approach ='thres'`; all products that exceed a threshold are considered, e.g., as purchase option; @chrzan2019, p. 112) and the *first choice* approach (`approach = 'fc'`; only product with highest utility is considered as purchase option; @chrzan2019, p. 111). - `turf_ladder()`: starts with one product and then subsequently adds one new product, which adds the maximum possible reach. Alternatively, it is also possible to define fixed products (i.e., products that must be part of the assortment). - `freqassort()`: Similar to `turf()`, `freqassort()` will give you the average frequency, representing how many products the participants will choose from a potential assortment. Again, you have to define a `none` alternative. `freqassort()` uses the *threshold* approach (see above). While `turf()` calculates the reach and frequency for **all** combinations, you specify the combination you are interested in `freqassort()`. - `reach()`: Similar to `turf()`, `reach()` will give you the average percentage of how many participants you can reach (at least one of the products resembles a purchase option) with your defined assortment. `reach()` also uses the *threshold* approach (see above). While `turf()` calculates the reach and frequency for **all** combinations, you specify the combination you are interested in `reach()`. - `marksim()`: Runs market simulations (either the share of preference, `sop` or first choice rule, `fc`), including the standard error, and the lower and upper confidence intervals [see also @orme2020, p. 94]. ### Converting raw utilities The validateHOT package also includes four functions designed to make the scores from both (A)CBC and MaxDiff analyses more interpretable, namely: - `att_imp()`: Converts the raw utilities of either an ACBC or CBC into importance scores for each attribute [see @orme2020, pp. 79-81] - `prob_scores()`: Converts the raw utilities of a MaxDiff to choice probabilities by applying the following procedures: - For unanchored MaxDiff: First, the scores are zero-centered, and then they are transformed by the following formula $\frac{exp^{U_i}}{(exp^{U_i} + a - 1)}$ [@chrzan2019, p. 56], where $U_i$ is the raw utility of item *i* and `a` is the number of items shown per MaxDiff task. - For anchored MaxDiff the following formula is applied: $\frac{exp^{U_i}}{(exp^{U_i} + a - 1)} * 100 * \frac{1}{a}$ [@chrzan2019, p. 59]. - `zc_diffs()`: Rescales the raw logit utilities to make them comparable across participants [@sawtooth2024, p. 330]. - `zero_anchored()`: Rescales the raw logits of a MaxDiff to zero-centered diffs [@chrzan2019, p. 64]. ### Data Frames provided by the validateHOT package The package includes five data sets that help to better explain the functions as well as the structure of the input, especially for the `create_hot()` function. - `acbc`: Example data set with raw utilities of an ACBC study conducted in Sawtooth Software [@sawtooth2024]. The price was linear-coded while the other attributes were coded as part-worths [@sablotny-wackershauser2024; @sawtooth2024]. - `acbc_interpolate`: Example data set with raw utilities of an ACBC study conducted in Sawtooth Software [@sawtooth2024]. Price was piecewise-coded, another attribute was linear-coded while the other attributes were coded as part-worth [@sablotny-wackershauser2024; @sawtooth2024]. - `cbc`: Example data set with raw utilities of a CBC study conducted in Sawtooth Software [@sawtooth2024]. All attributes were coded as part-worth [@sablotny-wackershauser2024; @sawtooth2024]. - `cbc_linear`: Example data set with raw utilities of a CBC study conducted in Sawtooth Software [@sawtooth2024] One attribute was linear-coded while the other attributes were part-worth coded [@sablotny-wackershauser2024; @sawtooth2024]. - `maxdiff`: Example data set with raw utilities of a MaxDiff study conducted in Sawtooth Software [@sawtooth2024; @schramm2024]. ## The story behind the validateHOT package The validateHOT package was born out of teaching preference measurement seminars to students, many of whom have little to no prior experience with R. One of the chapters in this class is about model validation by checking holdout tasks and we teach this, of course, in *R* 😍. We emphasize open science, and providing tools to run the analyses and share the code afterward. The validateHOT package makes this process look easy 🤹‍♀️. Of course, there are other great packages (i.e., the Metrics package by @Metrics), however, these packages need some more data wrangling to use the appropriate functions with the raw utilities, which might be a burden or barrier for some users. Moreover, as @yang2018 report, commercial studies often do not use any validation tasks. Again, the lack of experience in *R* could be one explanation. Since these functions are not always implemented in other software, not knowing how to apply it correctly might be the reason of not including it in the first instance. Having a package to evaluate the validation task can be very beneficial from this perspective. ## Installation You can install the development version of the validateHOT package from [GitHub](https://github.com/) with: ```{r, eval=FALSE} # install.packages("remotes") remotes::install_github("JoshSchramm94/validateHOT", dependencies = T, build_vignettes = T) ``` ## Example First, we load the package: ```{r} library("validateHOT") ``` ### Example I - CBC Since *CBC's* are applied more commonly compared to *ACBC* and *MaxDiff*, we will provide an example with a *CBC*. Let us load the `cbc` data frame for this example [@sablotny-wackershauser2024]. ```{r} data(cbc) ``` Now imagine you included a validation task with six alternatives plus a no-buy alternative. We specify the `data` argument and the `id`. Since we also have a *no-buy* alternative in our validation task, we specify the `none` argument. Afterwards, we define each alternative using the argument `prod.levels`. If we look back at the data frame, we can see that the first alternative in the holdout task (`c(3, 6, 10, 13, 16, 20, 24, 32, 35)`) is composed of the following attribute levels `r colnames(cbc)[3]`, `r colnames(cbc)[6]`, `r colnames(cbc)[10]`, `r colnames(cbc)[13]`, `r colnames(cbc)[16]`, `r colnames(cbc)[20]`, `r colnames(cbc)[24]`, `r colnames(cbc)[32]`, and `r colnames(cbc)[35]`. As mentioned above, all the attributes are part-worth coded and the alternatives have the same price as one of the levels shown (i.e., no interpolation). Thus, we set `coding = c(rep(0, times = 9))`. Finally, we specify the method, which is `method = "cbc"` in our case, and define the column of the actual participant's choice (`choice`). If you run the code, a data frame called `hot_cbc` will be returned to the global environment. > ❗ `create_hot()` takes both column names and column indexes. However, > please be aware, if you include linear-coded or piecewise-coded, you > **have** to provide the column indexes for the input in `prod.levels`. ```{r} hot_cbc <- create_hot( data = cbc, id = "id", none = "none", prod.levels = list( c(3, 6, 10, 13, 16, 20, 24, 32, 35), c(3, 5, 10, 14, 16, 18, 22, 27, 35), c(4, 6, 9, 14, 15, 20, 25, 30, 36), c(4, 5, 10, 11, 16, 19, 26, 32, 34), c(2, 6, 8, 14, 16, 17, 26, 31, 36), c(2, 5, 7, 12, 16, 20, 26, 29, 33) ), coding = c(rep(0, times = 9)), method = "cbc", choice = "hot" ) ``` > In case you just need to create a market scenario, you can also leave > the `choice` argument empty. Sometimes you estimate a part-worth coded attribute but want to treat this attribute as continuous in the validation task or market simulations (i.e., interpolate values). Please use the code `2` for this variable in the `coding` argument. Let us take a glimpse at the output, which shows the participants' total raw utilities for each of the six alternatives that were included in the validation task. ```{r} head(hot_cbc) ``` In the next step, we would like to see how well our model (from which we took the raw utilities) predicts the actual choices in the validation task. First, we will run the `hitrate()` function. We specify the `data`, the column names of the alternatives (`opts`; remember there are six alternatives + the *no-buy* alternative), and finally the actual choice (`choice`). ```{r} hitrate( data = hot_cbc, # data frame opts = c(option_1:none), # column names of alternatives choice = choice # column name of choice ) ``` Next, we look at the magnitude of the mean absolute error by running the `mae()` function. The arguments are the same as for the `hitrate()` function. ```{r} mae( data = hot_cbc, # data frame opts = c(option_1:none), # column names of alternatives choice = choice # column name of choice ) ``` Finally, let us test, how many participants would buy at least one of three products, assuming that this is one potential assortment we would like to offer to the consumers. We will use the `reach()` function. To specify the bundles we are offering we use the `opts` argument in the `reach()` function. ```{r} reach( data = hot_cbc, # data frame opts = c(option_1:option_3), # products that should be considered none = none # column name of none alternative ) ``` ### Example II - CBC with linear-coded attribute(s) In the second example, we again use a *CBC*, however, this time we show how to use the `create_hot()` function, if one of the variables is linear-coded. All other examples are provided in the accompanied vignette. We are using the data frame `cbc_linear` [@sablotny-wackershauser2024]. Again, we first load the data frame. ```{r} data(cbc_linear) ``` Next, we create the validation task to evaluate it in the next step. We use the same validation task as defined above (i.e., six alternatives plus the *no-buy* alternative). The only difference to the previous example is that the last attribute (`price`) was linear-coded and this time, we want to interpolate values. Again, we first define the `data` argument, the `id` as well as the `none` alternative. Next, we define the `prod.levels` for each alternative. Since we have one linear coded attribute, we need to specify the column indexes instead of the column names in `prod.levels`. We tell `create_hot()` that the last attribute needs to be interpolated by specifying the `coding` argument accordingly. This tells us that the first eight attributes are part-worth coded (`0`) while the last attribute is linear-coded (`1`). To interpolate the value, we need to provide `create_hot()` the `interpolate.levels`. These **need** to be the same levels as provided to Sawtooth Software [@sawtooth2024] or `ChoiceModelR`. Extrapolation is allowed, however, `create_hot()` will give a warning in case extrapolation is applied. Next, we define the column name of the linear coded variable (`lin.p`). Again, we are running a CBC specified by the `method` argument. This time, we would like to keep some of the variables in the data frame, which we specify by using the `varskeep` argument. We only keep one further variable, however, you can specify as many as you want. This could be relevant if you would like to display the results, for example, split by group. Finally, we define the actual choice (`choice`) in the validation task and we are all set. ```{r} hot_cbc_linear <- create_hot( data = cbc_linear, id = "id", none = "none", prod.levels = list( c(3, 6, 10, 13, 16, 20, 24, 32, 248.55), c(3, 5, 10, 14, 16, 18, 22, 27, 237.39), c(4, 6, 9, 14, 15, 20, 25, 30, 273.15), c(4, 5, 10, 11, 16, 19, 26, 32, 213.55), c(2, 6, 8, 14, 16, 17, 26, 31, 266.10), c(2, 5, 7, 12, 16, 20, 26, 29, 184.50) ), coding = c(rep(0, times = 8), 1), lin.p = "price", interpolate.levels = list(c(seq(from = 175.99, to = 350.99, by = 35))), method = "cbc", choice = "hot", varskeep = "group" ) ``` The next steps are the same as above. However, let us take a look at some examples in which we display the results per group. Let us again begin with the `hitrate()` function. ```{r} hitrate( data = hot_cbc_linear, # data frame opts = c(option_1:none), # column names of alternatives choice = choice, # column name of choice group = group # column name of Grouping variable ) ``` Lastly, this time we also want to use a rescaling function, namely `att_imp()` which gives us the importance of each attribute included [@orme2020]. We need the data set with the raw logit coefficients (`cbc_linear`; @sablotny-wackershauser2024). Next, we define the `attrib` argument. Here, we need to specify each attribute level for the corresponding level. Afterwards, we specify the coding again, and since we have one linear coded attribute, we need to define the `interpolate.levels` argument again, as we did for the `create_hot()` function above. Finally, we set `res` to `agg`, which tells the `att_imp()` function to display the aggregated results (to get results for each individual set argument `res` to `ind`). ```{r} att_imp( data = cbc_linear, attrib = list( c(paste0("att1_lev", c(1:3))), c(paste0("att2_lev", c(1:2))), c(paste0("att3_lev", c(1:4))), c(paste0("att4_lev", c(1:4))), c(paste0("att5_lev", c(1:2))), c(paste0("att6_lev", c(1:4))), c(paste0("att7_lev", c(1:6))), c(paste0("att8_lev", c(1:6))), "price" ), coding = c(rep(0, times = 8), 1), interpolate.levels = list(c(seq(from = 175.99, to = 350.99, by = 35))), res = "agg" ) ``` For more examples, please see the accompanied vignette. ## References

Owner

  • Name: Joshua Schramm
  • Login: JoshSchramm94
  • Kind: user

JOSS Publication

validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools
Published
March 03, 2025
Volume 10, Issue 107, Page 6708
Authors
Joshua Benjamin Schramm ORCID
Otto-von-Guericke-University Magdeburg
Marcel Lichters ORCID
Otto-von-Guericke-University Magdeburg
Editor
Sehrish Kanwal ORCID
Tags
MaxDiff Conjoint Analysis Market Simulations Predictive Validity

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Schramm
  given-names: Joshua Benjamin
  orcid: "https://orcid.org/0000-0001-5602-4632"
- family-names: Lichters
  given-names: Marcel
  orcid: "https://orcid.org/0000-0002-3710-2292"
contact:
- family-names: Schramm
  given-names: Joshua Benjamin
  orcid: "https://orcid.org/0000-0001-5602-4632"
doi: 10.5281/zenodo.14868050
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Schramm
    given-names: Joshua Benjamin
    orcid: "https://orcid.org/0000-0001-5602-4632"
  - family-names: Lichters
    given-names: Marcel
    orcid: "https://orcid.org/0000-0002-3710-2292"
  date-published: 2025-03-03
  doi: 10.21105/joss.06708
  issn: 2475-9066
  issue: 107
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6708
  title: validateHOT - an R package for the analysis of
    holdout/validation tasks and other choice modeling tools
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06708"
  volume: 10
title: validateHOT - an R package for the analysis of holdout/validation
  tasks and other choice modeling tools

GitHub Events

Total
  • Release event: 3
  • Push event: 30
  • Create event: 2
Last Year
  • Release event: 3
  • Push event: 30
  • Create event: 2

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 175
  • Total Committers: 2
  • Avg Commits per committer: 87.5
  • Development Distribution Score (DDS): 0.023
Past Year
  • Commits: 35
  • Committers: 1
  • Avg Commits per committer: 35.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
JoshSchramm94 s****4@g****m 171
James Uanhoro J****o@g****m 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 2
  • Average time to close issues: 21 days
  • Average time to close pull requests: 20 days
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 1.5
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jamesuanhoro (1)
Pull Request Authors
  • jamesuanhoro (4)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • R >= 3.5.0 depends
  • dplyr >= 1.1.0 imports
  • fastDummies >= 1.7.3 imports
  • magrittr >= 2.0.3 imports
  • stats >= 4.2.2 imports
  • tibble >= 3.2.1 imports
  • tidyr >= 1.3.0 imports
  • tidyselect >= 1.2.0 imports
  • utils >= 4.2.3 imports
  • Metrics >= 0.1.4 suggests
  • knitr >= 1.43 suggests
  • labelled >= 2.12.0 suggests
  • rmarkdown >= 2.24 suggests
  • testthat >= 3.0.0 suggests