validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools
validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools - Published in JOSS (2025)
Science Score: 98.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in JOSS metadata -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Last synced: 6 months ago
·
JSON representation
·
Repository
🎯 Validate your Holdout Task and other tools for preference measurement techniques
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 2
- Open Issues: 0
- Releases: 2
Created almost 3 years ago
· Last pushed 11 months ago
Metadata Files
Readme
Changelog
Contributing
License
Code of conduct
Citation
README.Rmd
---
output: github_document
editor_options:
markdown:
wrap: 72
bibliography: vignettes/references.bib
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# validateHOT 🎯
[](https://github.com/JoshSchramm94/validateHOT/actions/workflows/R-CMD-check.yaml)
[](https://www.gnu.org/licenses/gpl-3.0.en.html)
validateHOT is a package for preference measurement techniques. It provides
functions to evaluate validation tasks, perform market simulations, and
convert raw utility estimates into scores that are easier to interpret.
All three components are key functions for preference measurement
techniques such as choice-based conjoint (CBC), adaptive choice-based
conjoint (ACBC), or Maximum Difference Scaling (MaxDiff). This package
is particularly relevant for the [Sawtooth
Software](https://sawtoothsoftware.com/) community who would like to
report their analysis in *R* for open science purposes. In addition, it
is compatible with other packages, for example, the ChoiceModelR package
[@ChoiceModelR]. Further, the validateHOT package is valuable for
practitioners, who would like to conduct the analyses using an open-source
software.
Researchers and practitioners use preference measurement techniques for
various purposes, such as calculating the importance of specific
attributes or simulating markets [@gilbride2008; @steiner2018]. The
ultimate goal is to predict future behavior [@green1990]. To ensure
valid and reliable results, it is crucial that the collected data is
valid and can predict outcomes for tasks that were not included in the
estimation of the utility scores. Including validation tasks is highly
recommended [@Orme2015; @rao2014]. They do not only verify data validity
but can also help to test different models. The validateHOT package
offers helpful tools, to facilitate these functions (i.e., validate a
validation task, perform market simulations, and communicate results of
preference measurement techniques) - all within an open-source tool.
> The validateHOT package was primarily developed for use with Sawtooth
> Software [@sawtooth2024] and the ChoiceModelR package [@ChoiceModelR]. Please be cautious about using
> it with other platforms (especially for linear and piecewise-coded
> variables).
👉🏾 **What you need to provide**:
After collecting your data
and running your initial hierarchical Bayes model, you need to import your raw
utility scores. If you plan to validate a validation task you also need to
provide the actual choice made in this task. We provide a short tutorial in
this markdown. For a more comprehensive tutorial, please see the vignette
provided with the validateHOT package
(`vignette("validateHOT", package = "validateHOT")`).
👈🏾 **What you get**:
The validateHOT package currently
provides functions for four key components:
- validation metrics
- metrics commonly reported in machine learning (i.e., confusion
matrix)
- simulation methods, such as determining optimal product combinations
- converting raw logit utilities into more interpretable scores
For the first three components, the `create_hot()` function is essential.
This function calculates the total utilities for each alternative in the
validation task or the market scenario to be tested.
The `create_hot()` function computes the total utility of each alternative
based on the additive utility model [@rao2014, p. 82].
### Classical validation metrics
- `hitrate()`: calculates the hit rate (correctly predicted choices) of
the validation task.
- `kl()`: Kullback-Leibler-Divergence calculates the divergence between
the actual choice distribution and the predicted choice distribution
[@ding2011; @philentropy]. Due to the asymmetry of the Kullback-Leibler
divergence, the output includes divergence both from predicted to observed
and from observed to predicted. The validateHOT package currently provides
two logarithm bases: $log$ and $log_2$.
- `mae()`: calculates the mean absolute error, i.e., the deviation between
predicted and stated choice shares
- `medae()`: calculates the median absolute error
- `mhp()`: calculates the averaged hit probability of participant's
actual choice in the validation task
- `rmse()`: calculates the root mean square error of deviation between
predicted and stated choice shares
All functions can be extended with the `group` argument to get the output
split by group(s).
### Confusion Matrix
The validateHOT package includes metrics from machine learning, i.e., the
confusion matrix (e.g., @burger2018). For all of the 5 provided functions, a
**none** alternative has to be included in the validation task. The
logic of the implemented confusion matrix is to test, for example,
whether a buy or no-buy was correctly predicted. Information could be
used to get a sense of overestimation and underestimation of general
product demand. In the table below `TP` stands for true positives, `FP`
for false positives, `TN` for true negatives, and `FN` for false
negatives [@burger2018; @kuhn2008]. To translate this to the logic of
the validateHOT package, imagine you have a validation task with five
alternatives plus the alternative of not buying.
The validateHOT package now calculates whether a buy (participant
opts for one of the five alternatives) or a no-buy (participant opts for
the none alternative), respectively, is correctly predicted.
Please be aware that the validateHOT package applies the following
coding of the *buy* and *no-buy* alternatives. Rows refer to the
observed decisions while columns refer to the predicted ones.
| | Buy | No-buy |
|--------|:---:|:------:|
| Buy | TP | FN |
| No-buy | FP | TN |
- `accuracy()`: calculates the number of correctly predicted choices
(buy or no-buy); $\frac{TP + TN}{TP + TN + FP + FN}$ [@burger2018]
- `f1()`: defined as
$\frac{2 * precision * recall}{precision + recall}$ or stated
differently by @burger2018 $\frac{2TP}{2TP + FP + FN}$
- `precision()`: defined as $\frac{TP}{TP + FP}$ [@burger2018]
- `recall()`: defined as $\frac{TP}{TP + FN}$ [@burger2018]
- `specificity()`: defined as $\frac{TN}{TN + FP}$ [@burger2018]
Again, all functions can be extended with the `group` argument to get
the output split by group(s).
### Simulation Methods
- `turf()`: **T**(otal) **U**(nduplicated) **R**(each) and
**F**(requency) is a "product line extension model"
[@miaoulis1990; p. 29] that helps to find the perfect product bundle
based on the reach (e.g., how many participants consider buying at
least one product of that assortment) and the frequency (how many
products on average are a purchase option). `turf()` currently
provides both the *threshold* approach (`approach ='thres'`; all
products that exceed a threshold are considered, e.g., as purchase
option; @chrzan2019, p. 112) and the *first choice* approach
(`approach = 'fc'`; only product with highest utility is considered
as purchase option; @chrzan2019, p. 111).
- `turf_ladder()`: starts with one product and then subsequently adds one
new product, which adds the maximum possible reach. Alternatively, it
is also possible to define fixed products (i.e., products
that must be part of the assortment).
- `freqassort()`: Similar to `turf()`, `freqassort()` will give you
the average frequency, representing how many products the participants will
choose from a potential assortment. Again, you have to define a
`none` alternative. `freqassort()` uses the *threshold* approach (see
above). While `turf()` calculates the reach and frequency for **all**
combinations, you specify the combination you are interested in
`freqassort()`.
- `reach()`: Similar to `turf()`, `reach()` will give you the average
percentage of how many participants you can reach (at least one of
the products resembles a purchase option) with your defined assortment.
`reach()` also uses the *threshold* approach (see above).
While `turf()` calculates the reach and frequency for **all** combinations,
you specify the combination you are interested in `reach()`.
- `marksim()`: Runs market simulations (either the share of preference,
`sop` or first choice rule, `fc`), including the standard error, and
the lower and upper confidence intervals [see also @orme2020, p. 94].
### Converting raw utilities
The validateHOT package also includes four functions designed to make the
scores from both (A)CBC and MaxDiff analyses more interpretable, namely:
- `att_imp()`: Converts the raw utilities of either an ACBC or CBC
into importance scores for each attribute [see @orme2020, pp. 79-81]
- `prob_scores()`: Converts the raw utilities of a MaxDiff to choice
probabilities by applying the following procedures:
- For unanchored MaxDiff: First, the scores are zero-centered,
and then they are transformed by the following formula
$\frac{exp^{U_i}}{(exp^{U_i} + a - 1)}$ [@chrzan2019, p. 56], where
$U_i$ is the raw utility of item *i* and `a` is the number of items
shown per MaxDiff task.
- For anchored MaxDiff the following formula is applied:
$\frac{exp^{U_i}}{(exp^{U_i} + a - 1)} * 100 * \frac{1}{a}$
[@chrzan2019, p. 59].
- `zc_diffs()`: Rescales the raw logit utilities to make them
comparable across participants [@sawtooth2024, p. 330].
- `zero_anchored()`: Rescales the raw logits of a MaxDiff to
zero-centered diffs [@chrzan2019, p. 64].
### Data Frames provided by the validateHOT package
The package includes five data sets that help to better explain the
functions as well as the structure of the input, especially for the
`create_hot()` function.
- `acbc`: Example data set with raw utilities of an ACBC study conducted in
Sawtooth Software [@sawtooth2024]. The price was linear-coded while the
other attributes were coded as part-worths [@sablotny-wackershauser2024;
@sawtooth2024].
- `acbc_interpolate`: Example data set with raw utilities of an ACBC
study conducted in Sawtooth Software [@sawtooth2024]. Price was
piecewise-coded, another attribute was linear-coded while the other
attributes were coded as part-worth [@sablotny-wackershauser2024;
@sawtooth2024].
- `cbc`: Example data set with raw utilities of a CBC study conducted
in Sawtooth Software [@sawtooth2024]. All attributes were coded as part-worth
[@sablotny-wackershauser2024; @sawtooth2024].
- `cbc_linear`: Example data set with raw utilities of a CBC study
conducted in Sawtooth Software [@sawtooth2024] One attribute was
linear-coded while the other attributes were part-worth coded
[@sablotny-wackershauser2024; @sawtooth2024].
- `maxdiff`: Example data set with raw utilities of a MaxDiff study
conducted in Sawtooth Software [@sawtooth2024; @schramm2024].
## The story behind the validateHOT package
The validateHOT package was born out of teaching preference measurement
seminars to students, many of whom have little to no prior experience with R.
One of the chapters in this class is about model validation by checking holdout
tasks and we teach this, of course, in *R* 😍. We emphasize open science, and
providing tools to run the analyses and share the code afterward. The
validateHOT package makes this process look easy 🤹♀️. Of course, there are
other great packages (i.e., the Metrics package by @Metrics), however, these
packages need some more data wrangling to use the appropriate functions with
the raw utilities, which might be a burden or barrier for some users.
Moreover, as @yang2018 report, commercial studies often do not use any
validation tasks. Again, the lack of experience in *R* could be one
explanation. Since these functions are not always implemented in other
software, not knowing how to apply it correctly might be the reason of not
including it in the first instance. Having a package to evaluate the validation
task can be very beneficial from this perspective.
## Installation
You can install the development version of the validateHOT package from
[GitHub](https://github.com/) with:
```{r, eval=FALSE}
# install.packages("remotes")
remotes::install_github("JoshSchramm94/validateHOT", dependencies = T, build_vignettes = T)
```
## Example
First, we load the package:
```{r}
library("validateHOT")
```
### Example I - CBC
Since *CBC's* are applied more commonly compared to *ACBC* and
*MaxDiff*, we will provide an example with a *CBC*. Let us load the `cbc`
data frame for this example [@sablotny-wackershauser2024].
```{r}
data(cbc)
```
Now imagine you included a validation task with six alternatives plus a
no-buy alternative. We specify the `data` argument and the
`id`. Since we also have a *no-buy* alternative in our validation task,
we specify the `none` argument.
Afterwards, we define each alternative using the argument `prod.levels`.
If we look back at the data frame, we can see that the first alternative
in the holdout task (`c(3, 6, 10, 13, 16, 20, 24, 32, 35)`) is composed
of the following attribute levels `r colnames(cbc)[3]`,
`r colnames(cbc)[6]`, `r colnames(cbc)[10]`, `r colnames(cbc)[13]`,
`r colnames(cbc)[16]`, `r colnames(cbc)[20]`, `r colnames(cbc)[24]`,
`r colnames(cbc)[32]`, and `r colnames(cbc)[35]`.
As mentioned above, all the attributes are part-worth coded and the
alternatives have the same price as one of the levels shown (i.e., no
interpolation). Thus, we set `coding = c(rep(0, times = 9))`. Finally,
we specify the method, which is `method = "cbc"` in our case, and define
the column of the actual participant's choice (`choice`). If you run the
code, a data frame called `hot_cbc` will be returned to the global
environment.
> ❗ `create_hot()` takes both column names and column indexes. However,
> please be aware, if you include linear-coded or piecewise-coded, you
> **have** to provide the column indexes for the input in `prod.levels`.
```{r}
hot_cbc <- create_hot(
data = cbc,
id = "id",
none = "none",
prod.levels = list(
c(3, 6, 10, 13, 16, 20, 24, 32, 35),
c(3, 5, 10, 14, 16, 18, 22, 27, 35),
c(4, 6, 9, 14, 15, 20, 25, 30, 36),
c(4, 5, 10, 11, 16, 19, 26, 32, 34),
c(2, 6, 8, 14, 16, 17, 26, 31, 36),
c(2, 5, 7, 12, 16, 20, 26, 29, 33)
),
coding = c(rep(0, times = 9)),
method = "cbc",
choice = "hot"
)
```
> In case you just need to create a market scenario, you can also leave
> the `choice` argument empty.
Sometimes you estimate a part-worth coded attribute but want to treat
this attribute as continuous in the validation task or market
simulations (i.e., interpolate values). Please use the code `2` for this
variable in the `coding` argument.
Let us take a glimpse at the output, which shows the participants' total
raw utilities for each of the six alternatives that were included in the
validation task.
```{r}
head(hot_cbc)
```
In the next step, we would like to see how well our model (from which we
took the raw utilities) predicts the actual choices in the validation
task. First, we will run the `hitrate()` function. We specify the
`data`, the column names of the alternatives (`opts`; remember there are
six alternatives + the *no-buy* alternative), and finally the actual
choice (`choice`).
```{r}
hitrate(
data = hot_cbc, # data frame
opts = c(option_1:none), # column names of alternatives
choice = choice # column name of choice
)
```
Next, we look at the magnitude of the mean absolute error by running the
`mae()` function. The arguments are the same as for the `hitrate()` function.
```{r}
mae(
data = hot_cbc, # data frame
opts = c(option_1:none), # column names of alternatives
choice = choice # column name of choice
)
```
Finally, let us test, how many participants would buy at least one of three
products, assuming that this is one potential assortment we would like to offer
to the consumers. We will use the `reach()` function. To specify the bundles we
are offering we use the `opts` argument in the `reach()` function.
```{r}
reach(
data = hot_cbc, # data frame
opts = c(option_1:option_3), # products that should be considered
none = none # column name of none alternative
)
```
### Example II - CBC with linear-coded attribute(s)
In the second example, we again use a *CBC*, however, this time we show
how to use the `create_hot()` function, if one of the variables is linear-coded.
All other examples are provided in the accompanied vignette.
We are using the data frame `cbc_linear` [@sablotny-wackershauser2024].
Again, we first load the data frame.
```{r}
data(cbc_linear)
```
Next, we create the validation task to evaluate it in the next step. We
use the same validation task as defined above (i.e., six alternatives
plus the *no-buy* alternative). The only difference to the previous
example is that the last attribute (`price`) was linear-coded and
this time, we want to interpolate values.
Again, we first define the `data` argument, the `id` as well as the `none`
alternative. Next, we define the `prod.levels` for each alternative.
Since we have one linear coded attribute, we need to specify the column
indexes instead of the column names in `prod.levels`. We tell
`create_hot()` that the last attribute needs to be interpolated by
specifying the `coding` argument accordingly. This tells us that the
first eight attributes are part-worth coded (`0`) while the last
attribute is linear-coded (`1`).
To interpolate the value, we need to provide `create_hot()` the
`interpolate.levels`. These **need** to be the same levels as provided
to Sawtooth Software [@sawtooth2024] or `ChoiceModelR`.
Extrapolation is allowed, however, `create_hot()` will give a warning in
case extrapolation is applied.
Next, we define the column name of the linear coded variable (`lin.p`).
Again, we are running a CBC specified by the `method` argument. This
time, we would like to keep some of the variables in the data frame,
which we specify by using the `varskeep` argument. We only keep one
further variable, however, you can specify as many as you want. This
could be relevant if you would like to display the results, for example, split
by group. Finally, we define the actual choice (`choice`) in the validation
task and we are all set.
```{r}
hot_cbc_linear <- create_hot(
data = cbc_linear,
id = "id",
none = "none",
prod.levels = list(
c(3, 6, 10, 13, 16, 20, 24, 32, 248.55),
c(3, 5, 10, 14, 16, 18, 22, 27, 237.39),
c(4, 6, 9, 14, 15, 20, 25, 30, 273.15),
c(4, 5, 10, 11, 16, 19, 26, 32, 213.55),
c(2, 6, 8, 14, 16, 17, 26, 31, 266.10),
c(2, 5, 7, 12, 16, 20, 26, 29, 184.50)
),
coding = c(rep(0, times = 8), 1),
lin.p = "price",
interpolate.levels = list(c(seq(from = 175.99, to = 350.99, by = 35))),
method = "cbc",
choice = "hot",
varskeep = "group"
)
```
The next steps are the same as above. However, let us take a look at
some examples in which we display the results per group. Let us again
begin with the `hitrate()` function.
```{r}
hitrate(
data = hot_cbc_linear, # data frame
opts = c(option_1:none), # column names of alternatives
choice = choice, # column name of choice
group = group # column name of Grouping variable
)
```
Lastly, this time we also want to use a rescaling function, namely `att_imp()`
which gives us the importance of each attribute included [@orme2020]. We need
the data set with the raw logit coefficients (`cbc_linear`;
@sablotny-wackershauser2024). Next, we define the `attrib` argument. Here, we
need to specify each attribute level for the corresponding level. Afterwards,
we specify the coding again, and since we have one linear coded attribute, we
need to define the `interpolate.levels` argument again, as we did for the
`create_hot()` function above. Finally, we set `res` to `agg`, which
tells the `att_imp()` function to display the aggregated results (to get
results for each individual set argument `res` to `ind`).
```{r}
att_imp(
data = cbc_linear,
attrib = list(
c(paste0("att1_lev", c(1:3))),
c(paste0("att2_lev", c(1:2))),
c(paste0("att3_lev", c(1:4))),
c(paste0("att4_lev", c(1:4))),
c(paste0("att5_lev", c(1:2))),
c(paste0("att6_lev", c(1:4))),
c(paste0("att7_lev", c(1:6))),
c(paste0("att8_lev", c(1:6))),
"price"
),
coding = c(rep(0, times = 8), 1),
interpolate.levels = list(c(seq(from = 175.99, to = 350.99, by = 35))),
res = "agg"
)
```
For more examples, please see the accompanied vignette.
## References
Owner
- Name: Joshua Schramm
- Login: JoshSchramm94
- Kind: user
- Repositories: 1
- Profile: https://github.com/JoshSchramm94
JOSS Publication
validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools
Published
March 03, 2025
Volume 10, Issue 107, Page 6708
Authors
Tags
MaxDiff Conjoint Analysis Market Simulations Predictive ValidityCitation (CITATION.cff)
cff-version: "1.2.0"
authors:
- family-names: Schramm
given-names: Joshua Benjamin
orcid: "https://orcid.org/0000-0001-5602-4632"
- family-names: Lichters
given-names: Marcel
orcid: "https://orcid.org/0000-0002-3710-2292"
contact:
- family-names: Schramm
given-names: Joshua Benjamin
orcid: "https://orcid.org/0000-0001-5602-4632"
doi: 10.5281/zenodo.14868050
message: If you use this software, please cite our article in the
Journal of Open Source Software.
preferred-citation:
authors:
- family-names: Schramm
given-names: Joshua Benjamin
orcid: "https://orcid.org/0000-0001-5602-4632"
- family-names: Lichters
given-names: Marcel
orcid: "https://orcid.org/0000-0002-3710-2292"
date-published: 2025-03-03
doi: 10.21105/joss.06708
issn: 2475-9066
issue: 107
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 6708
title: validateHOT - an R package for the analysis of
holdout/validation tasks and other choice modeling tools
type: article
url: "https://joss.theoj.org/papers/10.21105/joss.06708"
volume: 10
title: validateHOT - an R package for the analysis of holdout/validation
tasks and other choice modeling tools
GitHub Events
Total
- Release event: 3
- Push event: 30
- Create event: 2
Last Year
- Release event: 3
- Push event: 30
- Create event: 2
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| JoshSchramm94 | s****4@g****m | 171 |
| James Uanhoro | J****o@g****m | 4 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 2
- Average time to close issues: 21 days
- Average time to close pull requests: 20 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 1.5
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jamesuanhoro (1)
Pull Request Authors
- jamesuanhoro (4)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 3.5.0 depends
- dplyr >= 1.1.0 imports
- fastDummies >= 1.7.3 imports
- magrittr >= 2.0.3 imports
- stats >= 4.2.2 imports
- tibble >= 3.2.1 imports
- tidyr >= 1.3.0 imports
- tidyselect >= 1.2.0 imports
- utils >= 4.2.3 imports
- Metrics >= 0.1.4 suggests
- knitr >= 1.43 suggests
- labelled >= 2.12.0 suggests
- rmarkdown >= 2.24 suggests
- testthat >= 3.0.0 suggests
