Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: tiffanymtang
- License: mit
- Language: HTML
- Default Branch: main
- Homepage: https://tiffanymtang.github.io/causalDT/
- Size: 116 MB
Statistics
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 1
Created almost 2 years ago
· Last pushed 10 months ago
Metadata Files
Readme
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = file.path("causalDT", "man", "figures", "README-")
)
```
# Causal Distillation Trees
[Causal Distillation Trees](https://arxiv.org/abs/2502.07275) (CDT) is a novel machine learning method for estimating interpretable subgroups in causal inference. CDT allows researchers to fit *any* machine learning model of their choice to estimate the individual-level treatment effect, and then leverages a simple, second-stage tree-based model to "distill" the estimated treatment effect into meaningful subgroups. As a result, CDT inherits the improvements in predictive performance from black-box machine learning models while preserving the interpretability of a simple decision tree.

Briefly, CDT is a two-stage learner that first fits a teacher model (e.g., a black-box metalearner) to estimate individual-level treatment effects and secondly fits a student model (e.g., a decision tree) to predict the estimated individual-level treatment effects, in effect distilling the estimated individual-level treatment effects and producing interpretable subgroups. This two-stage learner is learned using the training data. Finally, using the estimated subgroups, the subgroup average treatment effects are honestly estimated with a held-out estimation set.
For more details, check out [Huang, M., Tang, T. M., Kenney, A. M. "Distilling heterogeneous treatment effects: Stable subgroup estimation in causal inference." (2025).](https://arxiv.org/abs/2502.07275)
## Organization
This repository contains:
1. An R package `causalDT` to run causal distillation trees on your own data (see [causalDT/](causalDT/))
2. All code necessary to reproduce the analysis and figures in [Huang et al. (2025)](https://arxiv.org/abs/2502.07275) (see [causalDT-manuscript/](causalDT-manuscript/) and additional results [here](https://tiffanymtang.github.io/causalDT/simulation_results.html))
## Installation of the R package
You can install the `causalDT` R package via:
``` r
# install.packages("devtools")
devtools::install_github("tiffanymtang/causalDT", subdir = "causalDT")
```
## Example Usage
To illustrate an example usage of `causalDT`, we will use the AIDS Clinical Trials Group Study 175 (ACTG 175), a randomized controlled trial to determine the effectiveness of monotherapy compared to combination therapy on HIV-1-infected patients. This data can be found in the `speff2trial` R package.
```{r load-data}
# install.packages("speff2trial")
library(speff2trial)
library(dplyr)
data <- speff2trial::ACTG175 |>
dplyr::filter(arms %in% c(0, 2))
# pre-treatment covariates data
X <- data |>
dplyr::select(
age, wtkg, hemo, homo, drugs, karnof, race,
gender, symptom, preanti, strat, cd80
) |>
as.matrix()
# treatment indicator variable
Z <- data |>
dplyr::pull(treat)
# response variable
Y <- data |>
dplyr::pull(cens)
```
Given the pre-treatment covariates data $X$, the treatment variable $Z$, and the response variable $Y$, we can run CDT as follows:
```{r causalDT, fig.width=12}
library(causalDT)
set.seed(331)
causal_forest_cdt <- causalDT(
X = X, Y = Y, Z = Z,
teacher_model = "causal_forest"
)
plot_cdt(causal_forest_cdt)
```
Note that when using CDT, a teacher model must be chosen (the default is a causal forest). To help researchers select an appropriate teacher model, the Jaccard subgroup stability index (SSI) was introduced in [Huang et al. (2025)](https://arxiv.org/abs/2502.07275). Generally, a higher Jaccard SSI indicates a better teacher model. This teacher model selection procedure can be run as follows:
```{r jaccard}
## uncomment to install rlearner, which is needed to run rboost
# remotes::install_github("xnie/rlearner")
# selecting between causal forest versus rboost
rboost_cdt <- causalDT(
X = as.matrix(X), Y = Y, Z = Z,
teacher_model = rlearner_teacher(rlearner::rboost)
)
plot_jaccard(`Causal Forest` = causal_forest_cdt, `Rboost` = rboost_cdt)
```
## Citation
```
@article{huang2025distilling,
title={Distilling heterogeneous treatment effects: Stable subgroup estimation in causal inference},
author={Melody Huang and Tiffany M. Tang and Ana M. Kenney},
year={2025},
eprint={2502.07275},
archivePrefix={arXiv},
primaryClass={stat.ME},
url={https://arxiv.org/abs/2502.07275},
}
```
Owner
- Name: Tiffany Tang
- Login: tiffanymtang
- Kind: user
- Location: Berkeley, CA
- Company: University of California, Berkeley
- Website: tiffanymtang.github.io
- Repositories: 6
- Profile: https://github.com/tiffanymtang
PhD student in Statistics
GitHub Events
Total
- Release event: 1
- Watch event: 5
- Delete event: 2
- Push event: 25
- Public event: 1
- Pull request event: 4
- Fork event: 1
- Create event: 3
Last Year
- Release event: 1
- Watch event: 5
- Delete event: 2
- Push event: 25
- Public event: 1
- Pull request event: 4
- Fork event: 1
- Create event: 3
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 1 month
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 1 month
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- tiffanymtang (4)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 13 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
cran.r-project.org: causalDT
Causal Distillation Trees
- Homepage: https://tiffanymtang.github.io/causalDT/
- Documentation: http://cran.r-project.org/web/packages/causalDT/causalDT.pdf
- License: MIT + file LICENSE
-
Latest release: 1.0.0
published 10 months ago
Rankings
Dependent packages count: 25.6%
Dependent repos count: 31.5%
Average: 47.4%
Downloads: 85.3%
Maintainers (1)
Last synced:
10 months ago
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v4 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action v4.5.0 composite
- actions/checkout v4 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
causalDT/DESCRIPTION
cran
- Rcpp * imports
- dplyr * imports
- grf * imports
- partykit * imports
- purrr * imports
- rlearner >= 1.1.0 imports
- rpart * imports
- stringr * imports
- tibble * imports
- tidyselect * imports
- testthat >= 3.0.0 suggests