tehtuner

tehtuner: An R package to fit and tune models for the conditional average treatment effect - Published in JOSS (2023)

https://github.com/jackmwolf/tehtuner

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

clinical-trials heterogeneity-of-treatment-effect r subgroup-identification

Scientific Fields

Earth and Environmental Sciences Physical Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

An R Package to Fit and Tune to Models to Detect Treatment Effect Heterogeneity

Basic Info
  • Host: GitHub
  • Owner: jackmwolf
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 1010 KB
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Topics
clinical-trials heterogeneity-of-treatment-effect r subgroup-identification
Created about 4 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.Rmd

---
output: github_document
editor_options: 
  chunk_output_type: console
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

library(ggplot2)
library(ggtext)
library(magrittr)
library(kableExtra)
library(Cairo)
library(rpart.plot)
```

# tehtuner


[![CRAN status](https://www.r-pkg.org/badges/version/tehtuner)](https://CRAN.R-project.org/package=tehtuner)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.05453/status.svg)](https://doi.org/10.21105/joss.05453)
[![R-CMD-check](https://github.com/jackmwolf/tehtuner/workflows/R-CMD-check/badge.svg)](https://github.com/jackmwolf/tehtuner/actions)


The goal of `tehtuner` is to implement methods to fit models to detect and model
treatment effect heterogeneity (TEH) while controlling the Type-I error of falsely 
detecting a differential effect when the conditional average treatment effect is
uniform across the study population.

Currently `tehtuner` supports Virtual Twins models (Foster 
et al., 2011) for detecting TEH using the permutation procedure proposed in (Wolf et al., 2022).

Virtual Twins is a two-step approach to detecting differential treatment
effects. Subjects' conditional average treatment effects (CATEs) are first
estimated in Step 1 using a flexible model. Then, a simple and interpretable
model is fit in Step 2 to model these estimated CATEs as a function of the
covariates.

The Step 2 model is dependent on some tuning parameter. This parameter is
selected to control the Type-I error rate by permuting the data under the
null hypothesis of a constant treatment effect and identifying the minimal
null penalty parameter (MNPP), which is the smallest penalty parameter that
yields a Step 2 model with no covariate effects. The $1-\alpha$ quantile
of the distribution of is then used to fit the Step 2 model on the original
data.
In dong so, the Type-I error rate is controlled to be $\alpha$.

## Installation

`tehtuner` is available on [CRAN](https://CRAN.R-project.org); you can download the release version with:

``` r
install.packages("tehtuner")
```

You can download the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("jackmwolf/tehtuner")
```
## Example

We consider simulated data from a small clinical trial with 1000 subjects.
Each subject has 10 measured covariates, 8 continuous and 2 binary.
We are interested in estimating and understanding the CATE through Virtual Twins.

```{r}
library(tehtuner)
data("tehtuner_example")
```

We will consider a Virtual Twins model using a random forest to estimate the CATEs in Step 1 and then fitting a regression tree on the estimated CATEs in Step 2 with the Type-I error rate set at $\alpha = 0.2$.

```{r cache=TRUE}
set.seed(100)
vt_cate <- tunevt(
  data = tehtuner_example, Y = "Y", Trt = "Trt", step1 = "randomforest",
  step2 = "rtree", alpha0 = 0.2, p_reps = 100, ntree = 50
)
vt_cate
```

The fitted Step 2 model can be accessed via `$vtmod`. 
In this case, as we used a regression tree in Step 2, our final model model is of class `rpart.object`.
```{r dev='CairoPNG', warning = FALSE}
vt_cate$vtmod

rpart.plot::rpart.plot(vt_cate$vtmod, digits = -2)
```

The fitted model for the CATE is a function of the covariates (`V1`, and `V3`), so we would conclude that there is treatment effect heterogeneity at the 20% level. 

We can also look at the null distribution of the MNPP through `vt_cate$theta_null`.
The 80th percentile of $\hat\theta$ under the null hypothesis is
```{r}
quantile(vt_cate$theta_null, 0.8)
```

while the MNPP of our observed data is
```{r}
vt_cate$mnpp
```

The procedure fit the Step 2 model using the 80th quantile of the null distribution which resulted in a model that included covariates since the MNPP was above the 80th quantile.

```{r mnpp_plot, dev='CairoPNG', echo = FALSE}
ggplot(mapping = aes(x = vt_cate$theta_null, y = after_stat(density))) +
  geom_histogram(color = "black", fill = "white", binwidth = 0.025) +
  theme_minimal() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  geom_vline(
    aes(xintercept = c(vt_cate$mnpp, quantile(vt_cate$theta_null, 0.8))),
    color = c("#0072B2", "#D55E00"),
    linewidth = 2,
    linetype = 2
  ) + 
  labs(
    y = "Density",
    x = expression(hat(theta)),
    title =  expression("Sampling distribution of" ~ hat(theta) ~ "under" ~ H[0]),
    subtitle = paste0(
      "",
        "80th quantile (critical value): ", formatC(quantile(vt_cate$theta_null, 0.8), digits = 2, format = "f"),
      "",
      "; ",
      "",
        "Observed MNPP: ", formatC(vt_cate$mnpp, 2, format = "f"), 
      ""
      )
  ) +
  theme(
    plot.subtitle = element_markdown(size = 12)
  )
```

### Running in Parallel

Version `0.2.0` added the `parallel` option to `tunevt()` which allows the user to perform the permutation procedure in parallel to reduce computation times. 
Before doing so, you must register a parallel backend; see `?foreach::foreach` for more information.

For example, to carry out 100 permutations across 2 processors:

```{r eval = FALSE}
cl <- parallel::makeCluster(2)
doParallel::registerDoParallel(cl)

vt_cate_parallel <- tunevt(
  data = tehtuner_example, Y = "Y", Trt = "Trt", step1 = "randomforest",
  step2 = "rtree", alpha0 = 0.2, p_reps = 100, ntree = 50, parallel = TRUE
)

parallel::stopCluster(cl)
```


## References

- Foster, J. C., Taylor, J. M., & Ruberg, S. J. (2011). Subgroup identification from randomized clinical trial data. _Statistics in Medicine, 30_(24), 2867–2880. https://doi.org/10.1002/sim.4322

- Wolf, J. M., Koopmeiners, J. S., & Vock, D. M. (2022). A permutation procedure to detect heterogeneous treatment effects in randomized clinical trials while controlling the type-I error rate. _Clinical Trials, 19_(5). https://doi.org/10.1177/17407745221095855

- Deng C., Wolf J. M., Vock D. M., Carroll D. M., Hatsukami D. K., Leng N., & Koopmeiners J. S. (2023). “Practical guidance on modeling choices for the virtual twins method.” _Journal of Biopharmaceutical Statistics_. https://doi.org/10.1080/10543406.2023.2170404

- Wolf, J. M., (2023). tehtuner: An R package to fit and tune models for the conditional average treatment effect. _Journal of Open Source Software, 8_(86), 5453. https://doi.org/10.21105/joss.05453

Owner

  • Name: Jack M Wolf
  • Login: jackmwolf
  • Kind: user
  • Company: University of Minnesota Biostatistics

he/him \\ Biostatistics PhD Student

JOSS Publication

tehtuner: An R package to fit and tune models for the conditional average treatment effect
Published
June 26, 2023
Volume 8, Issue 86, Page 5453
Authors
Jack M. Wolf ORCID
University of Minnesota Division of Biostatistics, United States of America
Editor
Chris Hartgerink ORCID
Tags
causal inference clinical trials

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
Last Year
  • Watch event: 1
  • Push event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 61
  • Total Committers: 1
  • Avg Commits per committer: 61.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Jack M Wolf 3****f 61

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 8
  • Total pull requests: 5
  • Average time to close issues: 22 days
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.75
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jackmwolf (7)
  • elimillera (1)
Pull Request Authors
  • jackmwolf (5)
Top Labels
Issue Labels
enhancement (3) documentation (2) bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 184 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: tehtuner

Fit and Tune Models to Detect Treatment Effect Heterogeneity

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 184 Last month
Rankings
Forks count: 28.8%
Dependent packages count: 29.8%
Stargazers count: 31.7%
Dependent repos count: 35.5%
Average: 42.5%
Downloads: 86.8%
Maintainers (1)
Last synced: 4 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • Rdpack * imports
  • SuperLearner * imports
  • earth * imports
  • glmnet * imports
  • party * imports
  • randomForestSRC * imports
  • rpart * imports
  • stringr * imports
  • spelling * suggests
  • testthat >= 3.0.0 suggests
.github/workflows/check-standard.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite