CRE

CRE: An R package for interpretable discovery and inference of heterogeneous treatment effects - Published in JOSS (2023)

https://github.com/nsaph-software/cre

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in JOSS metadata
✓
Academic publication links
Links to: arxiv.org, joss.theoj.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Last synced: 9 months ago · JSON representation

Repository

The Causal Rule Ensemble Method

Basic Info

Host: GitHub
Owner: NSAPH-Software
License: gpl-3.0
Language: R
Default Branch: main
Homepage: https://nsaph-software.github.io/CRE/
Size: 12.7 MB

Statistics

Stars: 14
Watchers: 1
Forks: 5
Open Issues: 0
Releases: 10

Fork of kwonsang/CRE

Created almost 5 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License Code of conduct

CRE

Cover Image

Interpretable Discovery and Inference of Heterogeneous Treatment Effects

In health and social sciences, it is critically important to identify subgroups of the study population where a treatment has notable heterogeneity in the causal effects with respect to the average treatment effect (ATE). The bulk of heterogeneous treatment effect (HTE) literature focuses on two major tasks: (i) estimating HTEs by examining the conditional average treatment effect (CATE); (ii) discovering subgroups of a population characterized by HTE.

Several methodologies have been proposed for both tasks, but providing interpretability in the results is still an open challenge. Bargagli-Stoffi et al. (2023) proposed Causal Rule Ensemble, a new method for HTE characterization in terms of decision rules, via an extensive exploration of heterogeneity patterns by an ensemble-of-trees approach, enforcing stability in the discovery. CRE is an R Package providing a flexible implementation of the Causal Rule Ensemble algorithm.

Installation

Installing from CRAN.

r install.packages("CRE")

Installing the latest developing version.

r library(devtools) install_github("NSAPH-Software/CRE", ref="develop")

Import.

r library("CRE") The full list of required dependencies can be found in project in the DESCRIPTION file.

Arguments

Data (required)
y The observed response/outcome vector (binary or continuous).

z The treatment/exposure/policy vector (binary).

X The covariate matrix (binary or continuous).

Parameters (not required)
method_parameters The list of parameters to define the models used, including: - ratio_dis The ratio of data delegated to the discovery sub-sample (default: 0.5). - ite_method The method to estimate the individual treatment effect (ITE) pseudo-outcome estimation (default: "aipw") [1].
- learner_ps The SuperLearner model for the propensity score estimation (default: "SL.xgboost", used only for "aipw","bart","cf" ITE estimators). - learner_y The SuperLearner model for the outcome estimation (default: "SL.xgboost", used only for "aipw","slearner","tlearner" and "xlearner" ITE estimators).

hyper_params The list of hyper parameters to finetune the method, including: - intervention_vars Array with intervention-able covariates names used for Rules Generation. Empty or null array means that all the covariates are considered as intervention-able (default: NULL).
- ntrees The number of decision trees for random forest (default: 20).
- node_size Minimum size of the trees' terminal nodes (default: 20). - max_rules Maximum number of generated candidates rules (default: 50). - max_depth Maximum rules length (default: 3).
- t_decay The decay threshold for rules pruning (default: 0.025).
- t_ext The threshold to define too generic or too specific (extreme) rules (default: 0.01).
- t_corr The threshold to define correlated rules (default: 1). - stability_selection Method for stability selection for selecting the rules. "vanilla" for stability selection, "error_control" for stability selection with error control and "no" for no stability selection (default: "vanilla"). - B Number of bootstrap samples for stability selection in rules selection and uncertainty quantification in estimation (default: 20). - subsample Bootstrap ratio subsample and stability selection in rules selection, and uncertainty quantification in estimation (default: 0.5). - offset Name of the covariate to use as offset (i.e. "x1") for T-Poisson ITE Estimation. NULL if not used (default: NULL).
- cutoff Threshold defining the minimum cutoff value for the stability scores in Stability Selection (default: 0.9).
- pfer Upper bound for the per-family error rate (tolerated amount of falsely selected rules) in Error Control Stability Selection (default: 1).

Additional Estimates (not required)
ite The estimated ITE vector. If given, both the ITE estimation steps in Discovery and Inference are skipped (default: NULL).

Notes

[1] Options for the ITE estimation are as follows: - S-Learner (slearner) - T-Learner (tlearner) - T-Poisson (tpoisson) - X-Learner (xlearner) - Augmented Inverse Probability Weighting (aipw) - Causal Forests (cf) - Causal Bayesian Additive Regression Trees (bart)

If other estimates of the ITE are provided in ite additional argument, both the ITE estimations in discovery and inference are skipped and those values estimates are used instead. The ITE estimator requires also an outcome learner and/or a propensity score learner from the SuperLearner package (i.e., "SL.lm", "SL.svm"). Both these models are simple classifiers/regressors. By default XGBoost algorithm is used for both these steps.

Examples

Example 1 (default parameters) ```R set.seed(2023) dataset <- generatecredataset(n = 2000, rho = 0, nrules = 2, p = 10, effectsize = 5, binarycovariates = TRUE, binaryoutcome = FALSE, confounding = "no") y <- dataset[["y"]] z <- dataset[["z"]] X <- dataset[["X"]]

creresults <- cre(y, z, X) summary(creresults) plot(creresults) itepred <- predict(cre_results, X) ```

Example 2 (personalized ite estimation) ```R set.seed(2023) dataset <- generatecredataset(n = 2000, rho = 0, nrules = 2, p = 10, effectsize = 5, binarycovariates = TRUE, binaryoutcome = FALSE, confounding = "no") y <- dataset[["y"]] z <- dataset[["z"]] X <- dataset[["X"]]

personalized ITE estimation (S-Learner with Linear Regression)

model <- lm(y ~., data = data.frame(y = y, X = X, z = z)) ite_pred <- predict(model, newdata = data.frame(X = X, z = z))

creresults <- cre(y, z, X, ite = itepred) summary(creresults) plot(creresults) itepred <- predict(creresults, X) ```

Example 3 (setting parameters) ```R set.seed(2023) dataset <- generatecredataset(n = 2000, rho = 0, nrules = 2, p = 10, effectsize = 2, binarycovariates = TRUE, binaryoutcome = FALSE, confounding = "no") y <- dataset[["y"]] z <- dataset[["z"]] X <- dataset[["X"]]

methodparams = list(ratiodis = 0.5, itemethod ="aipw", learnerps = "SL.xgboost", learner_y = "SL.xgboost")

hyperparams = list(interventionvars = c("x1","x2","x3","x4","x5","x6"), offset = NULL, ntrees = 20, nodesize = 20, maxrules = 50, maxdepth = 2, tdecay = 0.025, text = 0.025, tcorr = 1, stability_selection = "vanilla", cutoff = 0.8, pfer = 0.1, B = 50, subsample = 0.1)

creresults <- cre(y, z, X, methodparams, hyperparams) summary(creresults) plot(creresults) itepred <- predict(cre_results, X) ```

More synthetic data sets can be generated using generate_cre_dataset().

Simulations

Reproduce simulation experiments in Section 4 in @bargagli2023causal, evaluating Causal Rule Ensemble Discovery and Estimation performances, comparing with different benchmarks.

Discovery: Evaluate performance of Causal Rule Ensemble algorithm (varying the pseudo-outcome estimator) in rules and effect modifier discovery.

r CRE/functional_tests/experiments/discovery.R

Estimation: Evaluate performance of Causal Rule Ensemble algorithm (varying the pseudo-outcome estimator) in treatment effect estimation and comparing it with the corresponding stand-alone ITE estimators.

r CRE/functional_tests/experiments/estimation.R

More exhaustive simulation studies and real world experiment of CRE package can be found at https://github.com/NSAPH-Projects/cre_applications.

Code of Conduct

Please note that the CRE project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms. More information about the opening issues and contributing (i.e., git branching model) can be found on CRE website.

Owner

Name: National Studies on Air Pollution and Health Software
Login: NSAPH-Software
Kind: organization
Location: United States of America

Website: https://nsaph-software.github.io/intro.html
Repositories: 11
Profile: https://github.com/NSAPH-Software

NSAPH Software is a collection of open-source packages to carry out National Studies on Air Pollution and Health.

JOSS Publication

CRE: An R package for interpretable discovery and inference of heterogeneous treatment effects

Published

December 15, 2023

DOI

10.21105/joss.05587

Volume 8, Issue 92, Page 5587

Authors

Riccardo Cadei

Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America, Department of Computer and Communication Science, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Naeem Khoshnevis

Research Computing, Harvard University, Cambridge, Massachusetts, United States of America

Kwonsang Lee

Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America

Daniela Maria Garcia

Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America

Falco J. Bargagli Stoffi

Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America

Editor

Susan Holmes

GitHub Events

Total

Release event: 1
Watch event: 1
Issue comment event: 1
Push event: 6
Pull request event: 1
Create event: 1

Last Year

Release event: 1
Watch event: 1
Issue comment event: 1
Push event: 6
Pull request event: 1
Create event: 1

Committers

Last synced: 10 months ago

All Time

Total Commits: 910
Total Committers: 5
Avg Commits per committer: 182.0
Development Distribution Score (DDS): 0.501

Past Year

Commits: 5
Committers: 1
Avg Commits per committer: 5.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Riccardo Cadei	4****i	454
naeemkh	k**m@g**m	260
Daniela Garcia	d**a@s**g	176
Falco J. Bargagli-Stoffi	3****i	17
Kwonsang Lee	k**t@g**m	3

Committer Domains (Top 20 + Academic)

surgofoundation.org: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 34
Total pull requests: 98
Average time to close issues: 6 months
Average time to close pull requests: 11 days
Total issue authors: 3
Total pull request authors: 3
Average comments per issue: 0.53
Average comments per pull request: 1.13
Merged pull requests: 91
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

riccardocadei (18)
Naeemkh (14)
salleuska (2)

Pull Request Authors

riccardocadei (49)
Naeemkh (46)
danielagarcia319 (3)

Top Labels

Issue Labels

enhancement (8) bug (3) documentation (2) good first issue (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 370 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 10
Total maintainers: 1

cran.r-project.org: CRE

Interpretable Discovery and Inference of Heterogeneous Treatment Effects

Homepage: https://github.com/NSAPH-Software/CRE
Documentation: http://cran.r-project.org/web/packages/CRE/CRE.pdf
License: GPL-3
Latest release: 0.2.7
published over 1 year ago

Versions: 10
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 370 Last month

Rankings

Forks count: 14.9%

Stargazers count: 17.9%

Average: 26.4%

Dependent packages count: 29.8%

Downloads: 34.0%

Dependent repos count: 35.5%

Maintainers (1)

fbargaglistoffi@hsph.harvard.edu

Last synced: 9 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/upload-artifact main composite
r-lib/actions/setup-pandoc v1 composite
r-lib/actions/setup-r v1 composite

DESCRIPTION cran

R >= 3.5.0 depends
MASS * imports
RRF * imports
SuperLearner * imports
bartCause * imports
bcf * imports
data.table * imports
dplyr * imports
gbm * imports
ggplot2 * imports
glmnet * imports
inTrees * imports
logger * imports
magrittr * imports
methods * imports
randomForest * imports
stabs * imports
stats * imports
stringr * imports
xgboost * imports
xtable * imports
BART * suggests
baggr * suggests
covr * suggests
gnm * suggests
grf * suggests
knitr * suggests
rmarkdown * suggests
testthat >= 3.0.0 suggests

docker_singularity/Dockerfile docker

rocker/verse 4.1.0 build

CRE

Science Score: 93.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

CRE

Interpretable Discovery and Inference of Heterogeneous Treatment Effects

Installation

Arguments

Notes

Examples

personalized ITE estimation (S-Learner with Linear Regression)

Simulations

Code of Conduct

Owner

JOSS Publication

CRE: An R package for interpretable discovery and inference of heterogeneous treatment effects

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: CRE

Rankings

Maintainers (1)

Dependencies