deltatest
R Package for Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Keywords
ab-testing
data-science
statistics
Last synced: 6 months ago
·
JSON representation
Repository
R Package for Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing
Basic Info
- Host: GitHub
- Owner: hoxo-m
- License: other
- Language: R
- Default Branch: main
- Homepage: https://hoxo-m.github.io/deltatest/
- Size: 2.23 MB
Statistics
- Stars: 8
- Watchers: 1
- Forks: 1
- Open Issues: 5
- Releases: 1
Topics
ab-testing
data-science
statistics
Created over 1 year ago
· Last pushed 8 months ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = 400,
message = FALSE
)
```
# deltatest: Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing
[](https://cran.r-project.org/package=deltatest)
[](https://cran.r-project.org/package=deltatest)
[](https://github.com/hoxo-m/deltatest/actions/workflows/R-CMD-check.yaml)
## 1. Overview
In online A/B testing, we often face a significant practical challenge: the randomization unit differs from the analysis unit. Typically, control and treatment groups are randomly assigned at the user level, while metrics—such as click-through rate—are measured at a more granular level (e.g., per page-view). In this case, the randomization unit is user, but the analysis unit is page-view.
This discrepancy raises concerns for statistical hypothesis testing, which assumes that data points are independent and identically distributed (i.i.d.). Specifically, a single user can generate multiple page-views, and each user may have a different probability of clicking. Consequently, the data may exhibit within-user correlation, thereby violating the i.i.d. assumption.
When the standard Z-test is applied to such correlated data, the resulting p-values do not follow the expected uniform distribution under the null hypothesis. As a result, smaller p-values tend to occur more frequently even when there is no true difference, increasing the risk of falsely detecting a significant difference.
```{r p-values-from-z-test, echo=FALSE, fig.height=3, fig.width=4, fig.alt="p-values from standard Z-test on correlated data"}
library(dplyr)
library(ggplot2)
file <- "data-raw/p_values_from_standard_Z_test.rds"
p_values <-
if (file.exists(file)) {
readRDS(file)
} else {
source("data-raw/compute_p_values.R")
readRDS(file)
}
df <- data.frame(p_value = p_values) |>
mutate(range = cut(p_value, breaks = seq(0, 1, by = 0.05))) |>
group_by(range) |>
summarise(p = factor(ceiling(max(p_value) * 20) / 20), n = n()) |>
mutate(prop = n / sum(n))
ggplot(df, aes(p, prop)) +
geom_col() +
geom_hline(yintercept = 0.05, color = "red") +
scale_y_continuous(breaks = seq(0, 1, by = 0.05)) +
xlab("p-value") + ylab("proportion") +
ggtitle("p-values from standard Z-test on correlated data") +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(size = 11))
```
To address this problem, Deng et al. (2018) proposed a modified statistical hypothesis testing method. Their approach replaces the standard variance estimation formula in the Z-test with an approximate formula derived via the Delta method, which accounts for within-user correlation. To simplify the application of this method, the **deltatest** package has been developed.
To illustrate how to use this package, we prepare a data frame that includes columns for the number of clicks and page-views aggregated for each user. This data frame also contains a column indicating whether each user was assigned to the control or treatment group.
```{r prepare_data}
library(dplyr)
n_user <- 2000
set.seed(314)
data <- deltatest::generate_dummy_data(n_user) |>
mutate(group = if_else(group == 0, "control", "treatment")) |>
group_by(user_id, group) |>
summarise(clicks = sum(metric), pageviews = n(), .groups = "drop")
data
```
The statistical hypothesis test using the Delta method can then be performed on this data as follows:
```{r execute}
library(deltatest)
deltatest(data, clicks / pageviews, by = group)
```
This version of the Z-test yields p-values that follow the expected uniform distribution under the null hypothesis, even when within-user correlation is present.
```{r p-values-from-delta-method, echo=FALSE, fig.height=3, fig.width=4, fig.alt="p-values from Z-test with Delta method on correlated data"}
p_values <- readRDS("data-raw/p_values_from_Delta_meethod.rds")
df <- data.frame(p_value = p_values) |>
mutate(range = cut(p_value, breaks = seq(0, 1, by = 0.05))) |>
group_by(range) |>
summarise(p = factor(ceiling(max(p_value) * 20) / 20), n = n()) |>
mutate(prop = n / sum(n))
ggplot(df, aes(p, prop)) +
geom_col() +
geom_hline(yintercept = 0.05, color = "red") +
scale_y_continuous(breaks = seq(0, 1, by = 0.01)) +
xlab("p-value") + ylab("proportion") +
ggtitle("p-values from Z-test with Delta method on correlated data") +
theme(axis.text.x = element_text(angle = 60, hjust = 1),
plot.title = element_text(size = 10))
```
## 2. Installation
You can install the **deltatest** package from [CRAN](https://cran.r-project.org/package=deltatest).
```{r install_cran, eval=FALSE}
install.packages("deltatest")
```
You can also install the development version from [GitHub](https://github.com/) with:
```{r install_github, eval=FALSE}
# install.packages("remotes")
remotes::install_github("hoxo-m/deltatest")
```
## 3. Details
The **deltatest** package provides the `deltatest` function for performing statistical hypothesis tests using the Delta method as proposed by Deng et al. (2018). In this section, we explain the function's arguments and its return value.
### 3.1 `data` Argument
To run `deltatest`, you need to prepare an appropriately aggregated data frame. This data frame must include columns for the numerator and denominator of your metric, aggregated for each randomization unit (typically, each user). For example:
- If your metric is click-through rate per page-view, the numerator is the number of clicks, and the denominator is the number of page-views.
- If your metric is conversion rate per session, the numerator is the number of conversions (or converted sessions), and the denominator is the number of sessions.
Note that the denominator should match the analysis unit.
The **deltatest** package provides the `generate_dummy_data` function to create dummy data. It generates metric values per page-view, so you need to aggregate the data by user.
```{r generate_dummy_data}
library(dplyr)
n_user <- 2000
set.seed(314)
data <- deltatest::generate_dummy_data(n_user) |>
group_by(user_id, group) |>
summarise(clicks = sum(metric), pageviews = n(), .groups = "drop")
data
```
This data frame includes the `user_id` column, but this column is not required to run `deltatest`.
### 3.2 `formula` and `by` Arguments
The second argument, `formula`, and the third argument, `by`, specify which columns in the data frame represent the numerator, denominator, and group. There are three input styles available for the `formula` argument.
#### (1) Standard Formula
This is the common formula format, where the left-hand side represents the target variable, and the right-hand side specifies the explanatory variable. In this case, the left-hand side should be of the form `numerator / denominator`, and the right-hand side should be the group column name. When using this style, you do not need to specify the `by` argument.
```{r standard_formula, eval=FALSE}
deltatest(data, clicks / pageviews ~ group)
```
#### (2) Lambda Formula
This is a relatively new way to express functions within a formula, where the function is written on the right-hand side of the formula. Specifically, you can write the function as `~ numerator / denominator`. In this style, you must specify the group column using the `by` argument.
```{r lambda_formula, eval=FALSE}
deltatest(data, ~ clicks / pageviews, by = group)
```
#### (3) NSE (Non-Standard Evaluation)
In this style, you can simply write `numerator / denominator`. The input is parsed using R's non-standard evaluation (NSE) feature, and you must specify the group column using the `by` argument.
```{r NSE, eval=FALSE}
deltatest(data, clicks / pageviews, by = group)
```
#### With Calculation (Applicable to All Styles)
All styles accept calculations. For example, if your data frame contains only columns for the positive count and negative count, you can express the metric as follows:
```{r with_calculation, eval=FALSE}
deltatest(data, pos / (pos + neg), by = group)
```
### 3.3 Other Arguments
#### `group_names`
For this argument, list the two types of elements in the group column in the order of control and treatment. By default, the function assumes that the types are specified in dictionary order for this argument and will display a message to that effect. To suppress the message, set the `quiet` argument to `TRUE`.
#### `type`
By default, `deltatest` tests the difference between two groups. If you specify `type = 'relative_change'`, it tests the rate of change, i.e., $(\mu_{t} - \mu_{c}) / \mu_{c}$ where $\mu_c$ and $\mu_t$ represent the mean values of the control group and the treatment group, respectively.
### 3.4 Return Value
The return value of `deltatest` is an object of class `htest`.
```{r return_value}
result <- deltatest(data, clicks / pageviews, by = group)
result
```
This object contains the estimates, the p-value, the confidence interval, and more.
```{r return_value_detail}
result$estimate
result$p.value
result$conf.int
```
You can also tidy the results by applying the `tidy` function from the **broom** package.
```{r}
broom::tidy(result)
```
For more details, refer to `help(deltatest)`.
## 4. Related Work
- [tidydelta: Estimation of Standard Errors using Delta Method](https://cran.r-project.org/package=tidydelta)
## 5. References
- Deng, A., Knoblich, U., & Lu, J. (2018). Applying the Delta Method in Metric
Analytics: A Practical Guide with Novel Ideas. *Proceedings of the 24th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining.*
[doi:10.1145/3219819.3219919](https://doi.org/10.1145/3219819.3219919)
- Deng, A., Lu, J., & Litz, J. (2017). Trustworthy Analysis of Online A/B Tests:
Pitfalls, challenges and solutions. *Proceedings of the Tenth ACM International
Conference on Web Search and Data Mining.*
[doi:10.1145/3018661.3018677](https://doi.org/10.1145/3018661.3018677)
- id:sz_dr (2018). Calculating the Mean and Variance of the Ratio of Random
Variables Using the Delta Method [in Japanese]. *If You're Human, Think More
Now.* https://www.szdrblog.info/entry/2018/11/18/154952
Owner
- Name: hoxo-m
- Login: hoxo-m
- Kind: user
- Location: Tokyo, Japan
- Company: HOXO-M Inc.
- Website: https://hoxo-m.com/
- Repositories: 24
- Profile: https://github.com/hoxo-m
Japanese Translator: "R for Everyone" "Automated Data Collection with R" "Feature Engineering for Machine Learning" "Federated Learning" etc.
GitHub Events
Total
- Create event: 3
- Release event: 1
- Issues event: 8
- Watch event: 7
- Delete event: 1
- Issue comment event: 2
- Push event: 68
- Pull request event: 3
- Fork event: 1
Last Year
- Create event: 3
- Release event: 1
- Issues event: 8
- Watch event: 7
- Delete event: 1
- Issue comment event: 2
- Push event: 68
- Pull request event: 3
- Fork event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 5
- Total pull requests: 2
- Average time to close issues: 16 days
- Average time to close pull requests: about 1 hour
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.4
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 2
- Average time to close issues: 16 days
- Average time to close pull requests: about 1 hour
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.4
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- hoxo-m (5)
Pull Request Authors
- hoxo-m (2)
Top Labels
Issue Labels
enhancement (2)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 191 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
cran.r-project.org: deltatest
Statistical Hypothesis Testing Using the Delta Method
- Homepage: https://github.com/hoxo-m/deltatest
- Documentation: http://cran.r-project.org/web/packages/deltatest/deltatest.pdf
- License: MIT + file LICENSE
-
Latest release: 0.1.0
published 12 months ago
Rankings
Dependent packages count: 27.1%
Dependent repos count: 33.4%
Average: 49.2%
Downloads: 87.0%
Maintainers (1)
Last synced:
6 months ago