validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools

validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools - Published in JOSS (2025)

https://github.com/joshschramm94/validatehot

Last synced: 6 months ago · JSON representation ·

Repository

🎯 Validate your Holdout Task and other tools for preference measurement techniques

Basic Info

Host: GitHub
Owner: JoshSchramm94
License: gpl-3.0
Language: R
Default Branch: main
Homepage:
Size: 3.53 MB

Statistics

Stars: 0
Watchers: 0
Forks: 2
Open Issues: 0
Releases: 2

Created almost 3 years ago · Last pushed 11 months ago

Metadata Files

Readme Changelog Contributing License Code of conduct Citation

README.Rmd

---
output: github_document
editor_options: 
  markdown: 
    wrap: 72
bibliography: vignettes/references.bib
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# validateHOT 🎯



[![R-CMD-check](https://github.com/JoshSchramm94/validateHOT/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/JoshSchramm94/validateHOT/actions/workflows/R-CMD-check.yaml)

[![license](https://img.shields.io/badge/license-GPL--3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0.en.html)



validateHOT is a package for preference measurement techniques. It provides
functions to evaluate validation tasks, perform market simulations, and
convert raw utility estimates into scores that are easier to interpret.
All three components are key functions for preference measurement
techniques such as choice-based conjoint (CBC), adaptive choice-based
conjoint (ACBC), or Maximum Difference Scaling (MaxDiff). This package
is particularly relevant for the [Sawtooth
Software](https://sawtoothsoftware.com/) community who would like to
report their analysis in *R* for open science purposes. In addition, it
is compatible with other packages, for example, the ChoiceModelR package
[@ChoiceModelR]. Further, the validateHOT package is valuable for
practitioners, who would like to conduct the analyses using an open-source
software.

Researchers and practitioners use preference measurement techniques for
various purposes, such as calculating the importance of specific
attributes or simulating markets [@gilbride2008; @steiner2018]. The
ultimate goal is to predict future behavior [@green1990]. To ensure
valid and reliable results, it is crucial that the collected data is
valid and can predict outcomes for tasks that were not included in the
estimation of the utility scores. Including validation tasks is highly
recommended [@Orme2015; @rao2014]. They do not only verify data validity
but can also help to test different models. The validateHOT package
offers helpful tools, to facilitate these functions (i.e., validate a
validation task, perform market simulations, and communicate results of
preference measurement techniques) - all within an open-source tool.

> The validateHOT package was primarily developed for use with Sawtooth
> Software [@sawtooth2024] and the ChoiceModelR package [@ChoiceModelR]. Please be cautious about using
> it with other platforms (especially for linear and piecewise-coded
> variables).

👉🏾 **What you need to provide**: 
 After collecting your data
and running your initial hierarchical Bayes model, you need to import your raw 
utility scores. If you plan to validate a validation task you also need to 
provide the actual choice made in this task. We provide a short tutorial in 
this markdown. For a more comprehensive tutorial, please see the vignette 
provided with the validateHOT package
(`vignette("validateHOT", package = "validateHOT")`).

👈🏾 **What you get**:
 The validateHOT package currently
provides functions for four key components:

-   validation metrics

-   metrics commonly reported in machine learning (i.e., confusion
    matrix)

-   simulation methods, such as determining optimal product combinations

-   converting raw logit utilities into more interpretable scores

For the first three components, the `create_hot()` function is essential.
This function calculates the total utilities for each alternative in the
validation task or the market scenario to be tested.
The `create_hot()` function computes the total utility of each alternative
based on the additive utility model [@rao2014, p. 82].

### Classical validation metrics

-   `hitrate()`: calculates the hit rate (correctly predicted choices) of
    the validation task.
-   `kl()`: Kullback-Leibler-Divergence calculates the divergence between
    the actual choice distribution and the predicted choice distribution
    [@ding2011; @philentropy]. Due to the asymmetry of the Kullback-Leibler 
    divergence, the output includes divergence both from predicted to observed 
    and from observed to predicted. The validateHOT package currently provides 
    two logarithm bases: $log$ and $log_2$.
-   `mae()`: calculates the mean absolute error, i.e., the deviation between
    predicted and stated choice shares
-   `medae()`: calculates the median absolute error
-   `mhp()`: calculates the averaged hit probability of participant's
    actual choice in the validation task
-   `rmse()`: calculates the root mean square error of deviation between
    predicted and stated choice shares

All functions can be extended with the `group` argument to get the output
split by group(s).

### Confusion Matrix

The validateHOT package includes metrics from machine learning, i.e., the 
confusion matrix (e.g., @burger2018). For all of the 5 provided functions, a
**none** alternative has to be included in the validation task. The
logic of the implemented confusion matrix is to test, for example,
whether a buy or no-buy was correctly predicted. Information could be
used to get a sense of overestimation and underestimation of general
product demand. In the table below `TP` stands for true positives, `FP`
for false positives, `TN` for true negatives, and `FN` for false
negatives [@burger2018; @kuhn2008]. To translate this to the logic of
the validateHOT package, imagine you have a validation task with five
alternatives plus the alternative of not buying.
The validateHOT package now calculates whether a buy (participant
opts for one of the five alternatives) or a no-buy (participant opts for
the none alternative), respectively, is correctly predicted.

Please be aware that the validateHOT package applies the following
coding of the *buy* and *no-buy* alternatives. Rows refer to the
observed decisions while columns refer to the predicted ones.

|        | Buy | No-buy |
|--------|:---:|:------:|
| Buy    | TP  |   FN   |
| No-buy | FP  |   TN   |

-   `accuracy()`: calculates the number of correctly predicted choices
    (buy or no-buy); $\frac{TP + TN}{TP + TN + FP + FN}$ [@burger2018]

-   `f1()`: defined as
    $\frac{2 * precision * recall}{precision + recall}$ or stated
    differently by @burger2018 $\frac{2TP}{2TP + FP + FN}$

-   `precision()`: defined as $\frac{TP}{TP + FP}$ [@burger2018]

-   `recall()`: defined as $\frac{TP}{TP + FN}$ [@burger2018]

-   `specificity()`: defined as $\frac{TN}{TN + FP}$ [@burger2018]

Again, all functions can be extended with the `group` argument to get
the output split by group(s).

### Simulation Methods

-   `turf()`: **T**(otal) **U**(nduplicated) **R**(each) and
    **F**(requency) is a "product line extension model"
    [@miaoulis1990; p. 29] that helps to find the perfect product bundle
    based on the reach (e.g., how many participants consider buying at
    least one product of that assortment) and the frequency (how many
    products on average are a purchase option). `turf()` currently
    provides both the *threshold* approach (`approach ='thres'`; all
    products that exceed a threshold are considered, e.g., as purchase
    option; @chrzan2019, p. 112) and the *first choice* approach
    (`approach = 'fc'`; only product with highest utility is considered
    as purchase option; @chrzan2019, p. 111).
-   `turf_ladder()`: starts with one product and then subsequently adds one
    new product, which adds the maximum possible reach. Alternatively, it
    is also possible to define fixed products (i.e., products
    that must be part of the assortment).
-   `freqassort()`: Similar to `turf()`, `freqassort()` will give you
    the average frequency, representing how many products the participants will
    choose from a potential assortment. Again, you have to define a
    `none` alternative. `freqassort()` uses the *threshold* approach (see 
    above). While `turf()` calculates the reach and frequency for **all** 
    combinations, you specify the combination you are interested in 
    `freqassort()`.
-   `reach()`: Similar to `turf()`, `reach()` will give you the average
    percentage of how many participants you can reach (at least one of
    the products resembles a purchase option) with your defined assortment. 
    `reach()` also uses the *threshold* approach (see above). 
    While `turf()` calculates the reach and frequency for **all** combinations,
    you specify the combination you are interested in `reach()`.
-   `marksim()`: Runs market simulations (either the share of preference,
    `sop` or first choice rule, `fc`), including the standard error, and
    the lower and upper confidence intervals [see also @orme2020, p. 94].

### Converting raw utilities

The validateHOT package also includes four functions designed to make the 
scores from both (A)CBC and MaxDiff analyses more interpretable, namely:

-   `att_imp()`: Converts the raw utilities of either an ACBC or CBC
    into importance scores for each attribute [see @orme2020, pp. 79-81]

-   `prob_scores()`: Converts the raw utilities of a MaxDiff to choice
    probabilities by applying the following procedures:

-   For unanchored MaxDiff: First, the scores are zero-centered,
    and then they are transformed by the following formula
    $\frac{exp^{U_i}}{(exp^{U_i} + a - 1)}$ [@chrzan2019, p. 56], where
    $U_i$ is the raw utility of item *i* and `a` is the number of items
    shown per MaxDiff task.

-   For anchored MaxDiff the following formula is applied:
    $\frac{exp^{U_i}}{(exp^{U_i} + a - 1)} * 100 * \frac{1}{a}$
    [@chrzan2019, p. 59].

-   `zc_diffs()`: Rescales the raw logit utilities to make them
    comparable across participants [@sawtooth2024, p. 330].

-   `zero_anchored()`: Rescales the raw logits of a MaxDiff to
    zero-centered diffs [@chrzan2019, p. 64].

### Data Frames provided by the validateHOT package

The package includes five data sets that help to better explain the
functions as well as the structure of the input, especially for the
`create_hot()` function.

-   `acbc`: Example data set with raw utilities of an ACBC study conducted in
    Sawtooth Software [@sawtooth2024]. The price was linear-coded while the 
    other attributes were coded as part-worths [@sablotny-wackershauser2024; 
    @sawtooth2024].

-   `acbc_interpolate`: Example data set with raw utilities of an ACBC
    study conducted in Sawtooth Software [@sawtooth2024]. Price was 
    piecewise-coded, another attribute was linear-coded while the other 
    attributes were coded as part-worth [@sablotny-wackershauser2024; 
    @sawtooth2024].

-   `cbc`: Example data set with raw utilities of a CBC study conducted
    in Sawtooth Software [@sawtooth2024]. All attributes were coded as part-worth
    [@sablotny-wackershauser2024; @sawtooth2024].

-   `cbc_linear`: Example data set with raw utilities of a CBC study
    conducted in Sawtooth Software [@sawtooth2024] One attribute was 
    linear-coded while the other attributes were part-worth coded
    [@sablotny-wackershauser2024; @sawtooth2024].

-   `maxdiff`: Example data set with raw utilities of a MaxDiff study
    conducted in Sawtooth Software [@sawtooth2024; @schramm2024].

## The story behind the validateHOT package

The validateHOT package was born out of teaching preference measurement 
seminars to students, many of whom have little to no prior experience with R. 
One of the chapters in this class is about model validation by checking holdout
tasks and we teach this, of course, in *R* 😍. We emphasize open science, and 
providing tools to run the analyses and share the code afterward. The 
validateHOT package makes this process look easy 🤹‍♀️. Of course, there are
other great packages (i.e., the Metrics package by @Metrics), however, these 
packages need some more data wrangling to use the appropriate functions with 
the raw utilities, which might be a burden or barrier for some users.

Moreover, as @yang2018 report, commercial studies often do not use any
validation tasks. Again, the lack of experience in *R* could be one
explanation. Since these functions are not always implemented in other
software, not knowing how to apply it correctly might be the reason of not
including it in the first instance. Having a package to evaluate the validation
task can be very beneficial from this perspective.

## Installation

You can install the development version of the validateHOT package from
[GitHub](https://github.com/) with:

```{r, eval=FALSE}
# install.packages("remotes")
remotes::install_github("JoshSchramm94/validateHOT", dependencies = T, build_vignettes = T)
```

## Example

First, we load the package:

```{r}
library("validateHOT")
```

### Example I - CBC

Since *CBC's* are applied more commonly compared to *ACBC* and
*MaxDiff*, we will provide an example with a *CBC*. Let us load the `cbc` 
data frame for this example [@sablotny-wackershauser2024].

```{r}
data(cbc)
```

Now imagine you included a validation task with six alternatives plus a
no-buy alternative. We specify the `data` argument and the
`id`. Since we also have a *no-buy* alternative in our validation task,
we specify the `none` argument.
Afterwards, we define each alternative using the argument `prod.levels`.
If we look back at the data frame, we can see that the first alternative
in the holdout task (`c(3, 6, 10, 13, 16, 20, 24, 32, 35)`) is composed
of the following attribute levels `r colnames(cbc)[3]`,
`r colnames(cbc)[6]`, `r colnames(cbc)[10]`, `r colnames(cbc)[13]`,
`r colnames(cbc)[16]`, `r colnames(cbc)[20]`, `r colnames(cbc)[24]`,
`r colnames(cbc)[32]`, and `r colnames(cbc)[35]`.

As mentioned above, all the attributes are part-worth coded and the
alternatives have the same price as one of the levels shown (i.e., no
interpolation). Thus, we set `coding = c(rep(0, times = 9))`. Finally,
we specify the method, which is `method = "cbc"` in our case, and define
the column of the actual participant's choice (`choice`). If you run the
code, a data frame called `hot_cbc` will be returned to the global
environment.

> ❗ `create_hot()` takes both column names and column indexes. However,
> please be aware, if you include linear-coded or piecewise-coded, you
> **have** to provide the column indexes for the input in `prod.levels`.

```{r}
hot_cbc <- create_hot(
  data = cbc,
  id = "id",
  none = "none",
  prod.levels = list(
    c(3, 6, 10, 13, 16, 20, 24, 32, 35),
    c(3, 5, 10, 14, 16, 18, 22, 27, 35),
    c(4, 6, 9, 14, 15, 20, 25, 30, 36),
    c(4, 5, 10, 11, 16, 19, 26, 32, 34),
    c(2, 6, 8, 14, 16, 17, 26, 31, 36),
    c(2, 5, 7, 12, 16, 20, 26, 29, 33)
  ),
  coding = c(rep(0, times = 9)),
  method = "cbc",
  choice = "hot"
)
```

> In case you just need to create a market scenario, you can also leave
> the `choice` argument empty.

Sometimes you estimate a part-worth coded attribute but want to treat
this attribute as continuous in the validation task or market
simulations (i.e., interpolate values). Please use the code `2` for this 
variable in the `coding` argument.

Let us take a glimpse at the output, which shows the participants' total
raw utilities for each of the six alternatives that were included in the
validation task.

```{r}
head(hot_cbc)
```

In the next step, we would like to see how well our model (from which we
took the raw utilities) predicts the actual choices in the validation
task. First, we will run the `hitrate()` function. We specify the
`data`, the column names of the alternatives (`opts`; remember there are
six alternatives + the *no-buy* alternative), and finally the actual
choice (`choice`).

```{r}
hitrate(
  data = hot_cbc, # data frame
  opts = c(option_1:none), # column names of alternatives
  choice = choice # column name of choice
)
```

Next, we look at the magnitude of the mean absolute error by running the
`mae()` function. The arguments are the same as for the `hitrate()` function.

```{r}
mae(
  data = hot_cbc, # data frame
  opts = c(option_1:none), # column names of alternatives
  choice = choice # column name of choice
)
```

Finally, let us test, how many participants would buy at least one of three 
products, assuming that this is one potential assortment we would like to offer
to the consumers. We will use the `reach()` function. To specify the bundles we
are offering we use the `opts` argument in the `reach()` function.

```{r}
reach(
  data = hot_cbc, # data frame
  opts = c(option_1:option_3), # products that should be considered
  none = none # column name of none alternative
)
```

### Example II - CBC with linear-coded attribute(s)

In the second example, we again use a *CBC*, however, this time we show
how to use the `create_hot()` function, if one of the variables is linear-coded.
All other examples are provided in the accompanied vignette.

We are using the data frame `cbc_linear` [@sablotny-wackershauser2024].
Again, we first load the data frame.

```{r}
data(cbc_linear)
```

Next, we create the validation task to evaluate it in the next step. We
use the same validation task as defined above (i.e., six alternatives
plus the *no-buy* alternative). The only difference to the previous
example is that the last attribute (`price`) was linear-coded and
this time, we want to interpolate values.

Again, we first define the `data` argument, the `id` as well as the `none`
alternative. Next, we define the `prod.levels` for each alternative.
Since we have one linear coded attribute, we need to specify the column
indexes instead of the column names in `prod.levels`. We tell
`create_hot()` that the last attribute needs to be interpolated by
specifying the `coding` argument accordingly. This tells us that the
first eight attributes are part-worth coded (`0`) while the last
attribute is linear-coded (`1`).

To interpolate the value, we need to provide `create_hot()` the
`interpolate.levels`. These **need** to be the same levels as provided
to Sawtooth Software [@sawtooth2024] or `ChoiceModelR`.
Extrapolation is allowed, however, `create_hot()` will give a warning in
case extrapolation is applied.

Next, we define the column name of the linear coded variable (`lin.p`).
Again, we are running a CBC specified by the `method` argument. This
time, we would like to keep some of the variables in the data frame,
which we specify by using the `varskeep` argument. We only keep one
further variable, however, you can specify as many as you want. This
could be relevant if you would like to display the results, for example, split
by group. Finally, we define the actual choice (`choice`) in the validation
task and we are all set.

```{r}
hot_cbc_linear <- create_hot(
  data = cbc_linear,
  id = "id",
  none = "none",
  prod.levels = list(
    c(3, 6, 10, 13, 16, 20, 24, 32, 248.55),
    c(3, 5, 10, 14, 16, 18, 22, 27, 237.39),
    c(4, 6, 9, 14, 15, 20, 25, 30, 273.15),
    c(4, 5, 10, 11, 16, 19, 26, 32, 213.55),
    c(2, 6, 8, 14, 16, 17, 26, 31, 266.10),
    c(2, 5, 7, 12, 16, 20, 26, 29, 184.50)
  ),
  coding = c(rep(0, times = 8), 1),
  lin.p = "price",
  interpolate.levels = list(c(seq(from = 175.99, to = 350.99, by = 35))),
  method = "cbc",
  choice = "hot",
  varskeep = "group"
)
```

The next steps are the same as above. However, let us take a look at
some examples in which we display the results per group. Let us again
begin with the `hitrate()` function. 

```{r}
hitrate(
  data = hot_cbc_linear, # data frame
  opts = c(option_1:none), # column names of alternatives
  choice = choice, # column name of choice
  group = group # column name of Grouping variable
)
```

Lastly, this time we also want to use a rescaling function, namely `att_imp()` 
which gives us the importance of each attribute included [@orme2020]. We need 
the data set with the raw logit coefficients (`cbc_linear`; 
@sablotny-wackershauser2024). Next, we define the `attrib` argument. Here, we 
need to specify each attribute level for the corresponding level. Afterwards, 
we specify the coding again, and since we have one linear coded attribute, we 
need to define the `interpolate.levels` argument again, as we did for the
`create_hot()` function above. Finally, we set `res` to `agg`, which
tells the `att_imp()` function to display the aggregated results (to get 
results for each individual set argument `res` to `ind`).

```{r}
att_imp(
  data = cbc_linear,
  attrib = list(
    c(paste0("att1_lev", c(1:3))),
    c(paste0("att2_lev", c(1:2))),
    c(paste0("att3_lev", c(1:4))),
    c(paste0("att4_lev", c(1:4))),
    c(paste0("att5_lev", c(1:2))),
    c(paste0("att6_lev", c(1:4))),
    c(paste0("att7_lev", c(1:6))),
    c(paste0("att8_lev", c(1:6))),
    "price"
  ),
  coding = c(rep(0, times = 8), 1),
  interpolate.levels = list(c(seq(from = 175.99, to = 350.99, by = 35))),
  res = "agg"
)
```

For more examples, please see the accompanied vignette.

## References

Owner

Name: Joshua Schramm
Login: JoshSchramm94
Kind: user

Repositories: 1
Profile: https://github.com/JoshSchramm94

JOSS Publication

validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools

Published

March 03, 2025

DOI

10.21105/joss.06708

Volume 10, Issue 107, Page 6708

Authors

Joshua Benjamin Schramm

Otto-von-Guericke-University Magdeburg

Marcel Lichters

Otto-von-Guericke-University Magdeburg

Editor

Sehrish Kanwal

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Schramm
  given-names: Joshua Benjamin
  orcid: "https://orcid.org/0000-0001-5602-4632"
- family-names: Lichters
  given-names: Marcel
  orcid: "https://orcid.org/0000-0002-3710-2292"
contact:
- family-names: Schramm
  given-names: Joshua Benjamin
  orcid: "https://orcid.org/0000-0001-5602-4632"
doi: 10.5281/zenodo.14868050
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Schramm
    given-names: Joshua Benjamin
    orcid: "https://orcid.org/0000-0001-5602-4632"
  - family-names: Lichters
    given-names: Marcel
    orcid: "https://orcid.org/0000-0002-3710-2292"
  date-published: 2025-03-03
  doi: 10.21105/joss.06708
  issn: 2475-9066
  issue: 107
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6708
  title: validateHOT - an R package for the analysis of
    holdout/validation tasks and other choice modeling tools
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06708"
  volume: 10
title: validateHOT - an R package for the analysis of holdout/validation
  tasks and other choice modeling tools

GitHub Events

Total

Release event: 3
Push event: 30
Create event: 2

Last Year

Release event: 3
Push event: 30
Create event: 2

Committers

Last synced: 7 months ago

All Time

Total Commits: 175
Total Committers: 2
Avg Commits per committer: 87.5
Development Distribution Score (DDS): 0.023

Past Year

Commits: 35
Committers: 1
Avg Commits per committer: 35.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
JoshSchramm94	s**4@g**m	171
James Uanhoro	J**o@g**m	4

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 2
Average time to close issues: 21 days
Average time to close pull requests: 20 days
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 1.0
Average comments per pull request: 1.5
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools

Science Score: 98.0%

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

JOSS Publication

validateHOT - an R package for the analysis of holdout/validation tasks and other choice modeling tools

Authors

Editor

Tags

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies