deltatest

R Package for Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing

https://github.com/hoxo-m/deltatest

Keywords

ab-testing data-science statistics

Last synced: 6 months ago · JSON representation

Repository

R Package for Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing

Basic Info

Host: GitHub
Owner: hoxo-m
License: other
Language: R
Default Branch: main
Homepage: https://hoxo-m.github.io/deltatest/
Size: 2.23 MB

Statistics

Stars: 8
Watchers: 1
Forks: 1
Open Issues: 5
Releases: 1

Topics

ab-testing data-science statistics

Created over 1 year ago · Last pushed 8 months ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---



```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = 400,
  message = FALSE
)
```

# deltatest: Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing


[![CRAN-version](https://www.r-pkg.org/badges/version/deltatest)](https://cran.r-project.org/package=deltatest)
[![CRAN-downloads](https://cranlogs.r-pkg.org/badges/deltatest)](https://cran.r-project.org/package=deltatest)
[![R-CMD-check](https://github.com/hoxo-m/deltatest/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/hoxo-m/deltatest/actions/workflows/R-CMD-check.yaml)


## 1. Overview

In online A/B testing, we often face a significant practical challenge: the randomization unit differs from the analysis unit. Typically, control and treatment groups are randomly assigned at the user level, while metrics—such as click-through rate—are measured at a more granular level (e.g., per page-view). In this case, the randomization unit is user, but the analysis unit is page-view.

This discrepancy raises concerns for statistical hypothesis testing, which assumes that data points are independent and identically distributed (i.i.d.). Specifically, a single user can generate multiple page-views, and each user may have a different probability of clicking. Consequently, the data may exhibit within-user correlation, thereby violating the i.i.d. assumption.

When the standard Z-test is applied to such correlated data, the resulting  p-values do not follow the expected uniform distribution under the null hypothesis. As a result, smaller p-values tend to occur more frequently even when there is no true difference, increasing the risk of falsely detecting a significant difference.

```{r p-values-from-z-test, echo=FALSE, fig.height=3, fig.width=4, fig.alt="p-values from standard Z-test on correlated data"}
library(dplyr)
library(ggplot2)

file <- "data-raw/p_values_from_standard_Z_test.rds"

p_values <- 
  if (file.exists(file)) {
    readRDS(file)
  } else {
    source("data-raw/compute_p_values.R")
    readRDS(file)
  }

df <- data.frame(p_value = p_values) |>
  mutate(range = cut(p_value, breaks = seq(0, 1, by = 0.05))) |>
  group_by(range) |>
  summarise(p = factor(ceiling(max(p_value) * 20) / 20), n = n()) |>
  mutate(prop = n / sum(n))

ggplot(df, aes(p, prop)) +
  geom_col() +
  geom_hline(yintercept = 0.05, color = "red") +
  scale_y_continuous(breaks = seq(0, 1, by = 0.05)) +
  xlab("p-value") + ylab("proportion") +
  ggtitle("p-values from standard Z-test on correlated data") +
  theme(axis.text.x = element_text(angle = 60, hjust = 1), 
        plot.title = element_text(size = 11))
```

To address this problem, Deng et al. (2018) proposed a modified statistical hypothesis testing method. Their approach replaces the standard variance estimation formula in the Z-test with an approximate formula derived via the Delta method, which accounts for within-user correlation. To simplify the application of this method, the **deltatest** package has been developed.

To illustrate how to use this package, we prepare a data frame that includes columns for the number of clicks and page-views aggregated for each user. This data frame also contains a column indicating whether each user was assigned to the control or treatment group.

```{r prepare_data}
library(dplyr)

n_user <- 2000

set.seed(314)
data <- deltatest::generate_dummy_data(n_user) |> 
  mutate(group = if_else(group == 0, "control", "treatment")) |>
  group_by(user_id, group) |> 
  summarise(clicks = sum(metric), pageviews = n(), .groups = "drop")

data
```

The statistical hypothesis test using the Delta method can then be performed on this data as follows:

```{r execute}
library(deltatest)

deltatest(data, clicks / pageviews, by = group)
```

This version of the Z-test yields p-values that follow the expected uniform distribution under the null hypothesis, even when within-user correlation is present.

```{r p-values-from-delta-method, echo=FALSE, fig.height=3, fig.width=4, fig.alt="p-values from Z-test with Delta method on correlated data"}
p_values <- readRDS("data-raw/p_values_from_Delta_meethod.rds")

df <- data.frame(p_value = p_values) |>
  mutate(range = cut(p_value, breaks = seq(0, 1, by = 0.05))) |>
  group_by(range) |>
  summarise(p = factor(ceiling(max(p_value) * 20) / 20), n = n()) |>
  mutate(prop = n / sum(n))

ggplot(df, aes(p, prop)) +
  geom_col() +
  geom_hline(yintercept = 0.05, color = "red") +
  scale_y_continuous(breaks = seq(0, 1, by = 0.01)) +
  xlab("p-value") + ylab("proportion") +
  ggtitle("p-values from Z-test with Delta method on correlated data") +
  theme(axis.text.x = element_text(angle = 60, hjust = 1), 
        plot.title = element_text(size = 10))
```

## 2. Installation

You can install the **deltatest** package from [CRAN](https://cran.r-project.org/package=deltatest).

```{r install_cran, eval=FALSE}
install.packages("deltatest")
```

You can also install the development version from [GitHub](https://github.com/) with:

```{r install_github, eval=FALSE}
# install.packages("remotes")
remotes::install_github("hoxo-m/deltatest")
```

## 3. Details

The **deltatest** package provides the `deltatest` function for performing statistical hypothesis tests using the Delta method as proposed by Deng et al. (2018). In this section, we explain the function's arguments and its return value.

### 3.1 `data` Argument

To run `deltatest`, you need to prepare an appropriately aggregated data frame. This data frame must include columns for the numerator and denominator of your metric, aggregated for each randomization unit (typically, each user). For example:

- If your metric is click-through rate per page-view, the numerator is the number of clicks, and the denominator is the number of page-views.
- If your metric is conversion rate per session, the numerator is the number of conversions (or converted sessions), and the denominator is the number of sessions.

Note that the denominator should match the analysis unit.

The **deltatest** package provides the `generate_dummy_data` function to create dummy data. It generates metric values per page-view, so you need to aggregate the data by user.

```{r generate_dummy_data}
library(dplyr)

n_user <- 2000

set.seed(314)
data <- deltatest::generate_dummy_data(n_user) |>
  group_by(user_id, group) |>
  summarise(clicks = sum(metric), pageviews = n(), .groups = "drop")

data
```

This data frame includes the `user_id` column, but this column is not required to run `deltatest`.

### 3.2 `formula` and `by` Arguments

The second argument, `formula`, and the third argument, `by`, specify which columns in the data frame represent the numerator, denominator, and group. There are three input styles available for the `formula` argument.

#### (1) Standard Formula

This is the common formula format, where the left-hand side represents the target variable, and the right-hand side specifies the explanatory variable. In this case, the left-hand side should be of the form `numerator / denominator`, and the right-hand side should be the group column name. When using this style, you do not need to specify the `by` argument.

```{r standard_formula, eval=FALSE}
deltatest(data, clicks / pageviews ~ group)
```

#### (2) Lambda Formula

This is a relatively new way to express functions within a formula, where the function is written on the right-hand side of the formula. Specifically, you can write the function as `~ numerator / denominator`. In this style, you must specify the group column using the `by` argument.

```{r lambda_formula, eval=FALSE}
deltatest(data, ~ clicks / pageviews, by = group)
```

#### (3) NSE (Non-Standard Evaluation)

In this style, you can simply write `numerator / denominator`. The input is parsed using R's non-standard evaluation (NSE) feature, and you must specify the group column using the `by` argument.

```{r NSE, eval=FALSE}
deltatest(data, clicks / pageviews, by = group)
```

#### With Calculation (Applicable to All Styles)

All styles accept calculations. For example, if your data frame contains only columns for the positive count and negative count, you can express the metric as follows:

```{r with_calculation, eval=FALSE}
deltatest(data, pos / (pos + neg), by = group)
```

### 3.3 Other Arguments

#### `group_names`

For this argument, list the two types of elements in the group column in the order of control and treatment. By default, the function assumes that the types are specified in dictionary order for this argument and will display a message to that effect. To suppress the message, set the `quiet` argument to `TRUE`.

#### `type`

By default, `deltatest` tests the difference between two groups. If you specify `type = 'relative_change'`, it tests the rate of change, i.e., $(\mu_{t} - \mu_{c}) / \mu_{c}$ where $\mu_c$ and $\mu_t$ represent the mean values of the control group and the treatment group, respectively.

### 3.4 Return Value

The return value of `deltatest` is an object of class `htest`.

```{r return_value}
result <- deltatest(data, clicks / pageviews, by = group)
result
```

This object contains the estimates, the p-value, the confidence interval, and more. 

```{r return_value_detail}
result$estimate

result$p.value

result$conf.int
```

You can also tidy the results by applying the `tidy` function from the **broom** package.

```{r}
broom::tidy(result)
```

For more details, refer to `help(deltatest)`.

## 4. Related Work

- [tidydelta: Estimation of Standard Errors using Delta Method](https://cran.r-project.org/package=tidydelta)

## 5. References

- Deng, A., Knoblich, U., & Lu, J. (2018). Applying the Delta Method in Metric
Analytics: A Practical Guide with Novel Ideas. *Proceedings of the 24th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining.*
[doi:10.1145/3219819.3219919](https://doi.org/10.1145/3219819.3219919)
- Deng, A., Lu, J., & Litz, J. (2017). Trustworthy Analysis of Online A/B Tests:
Pitfalls, challenges and solutions. *Proceedings of the Tenth ACM International
Conference on Web Search and Data Mining.*
[doi:10.1145/3018661.3018677](https://doi.org/10.1145/3018661.3018677)
- id:sz_dr (2018). Calculating the Mean and Variance of the Ratio of Random
Variables Using the Delta Method [in Japanese]. *If You're Human, Think More
Now.* https://www.szdrblog.info/entry/2018/11/18/154952

Owner

Name: hoxo-m
Login: hoxo-m
Kind: user
Location: Tokyo, Japan
Company: HOXO-M Inc.

Website: https://hoxo-m.com/
Repositories: 24
Profile: https://github.com/hoxo-m

Japanese Translator: "R for Everyone" "Automated Data Collection with R" "Feature Engineering for Machine Learning" "Federated Learning" etc.

GitHub Events

Total

Create event: 3
Release event: 1
Issues event: 8
Watch event: 7
Delete event: 1
Issue comment event: 2
Push event: 68
Pull request event: 3
Fork event: 1

Last Year

Create event: 3
Release event: 1
Issues event: 8
Watch event: 7
Delete event: 1
Issue comment event: 2
Push event: 68
Pull request event: 3
Fork event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 5
Total pull requests: 2
Average time to close issues: 16 days
Average time to close pull requests: about 1 hour
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.4
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 2
Average time to close issues: 16 days
Average time to close pull requests: about 1 hour
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.4
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

hoxo-m (5)

Pull Request Authors

hoxo-m (2)

Top Labels

Issue Labels

enhancement (2)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 191 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

cran.r-project.org: deltatest

Statistical Hypothesis Testing Using the Delta Method

Homepage: https://github.com/hoxo-m/deltatest
Documentation: http://cran.r-project.org/web/packages/deltatest/deltatest.pdf
License: MIT + file LICENSE
Latest release: 0.1.0
published 12 months ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 191 Last month

Rankings

Dependent packages count: 27.1%

Dependent repos count: 33.4%

Average: 49.2%

Downloads: 87.0%

Maintainers (1)

hoxo.smile@gmail.com