grf

Generalized Random Forests

https://github.com/grf-labs/grf

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    7 of 32 committers (21.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.5%) to scientific vocabulary

Keywords

causal-forest causal-inference econometrics machine-learning random-forest statistics

Keywords from Contributors

date-time
Last synced: 6 months ago · JSON representation

Repository

Generalized Random Forests

Basic Info
Statistics
  • Stars: 1,041
  • Watchers: 46
  • Forks: 264
  • Open Issues: 57
  • Releases: 0
Topics
causal-forest causal-inference econometrics machine-learning random-forest statistics
Created over 9 years ago · Last pushed 7 months ago
Metadata Files
Readme License

README.md

generalized random forests

CRANstatus Build Status

A package for forest-based statistical estimation and inference. GRF provides non-parametric methods for heterogeneous treatment effects estimation (optionally using right-censored outcomes, multiple treatment arms or outcomes, or instrumental variables), as well as least-squares regression, quantile regression, and survival regression, all with support for missing covariates.

In addition, GRF supports 'honest' estimation (where one subset of the data is used for choosing splits, and another for populating the leaves of the tree), and confidence intervals for least-squares regression and treatment effect estimation.

Some helpful links for getting started:

The repository first started as a fork of the ranger repository -- we owe a great deal of thanks to the ranger authors for their useful and free package.

Installation

The latest release of the package can be installed through CRAN:

R install.packages("grf")

conda users can install from the conda-forge channel:

conda install -c conda-forge r-grf

The current development version can be installed from source using devtools.

R devtools::install_github("grf-labs/grf", subdir = "r-package/grf")

Note that to install from source, a compiler that implements C++11 or later is required. If installing on Windows, the RTools toolchain is also required.

Usage Examples

The following script demonstrates how to use GRF for heterogeneous treatment effect estimation. For examples of how to use other types of forests, please consult the R documentation on the relevant methods.

```R library(grf)

Generate data.

n <- 2000 p <- 10 X <- matrix(rnorm(n * p), n, p) X.test <- matrix(0, 101, p) X.test[, 1] <- seq(-2, 2, length.out = 101)

Train a causal forest.

W <- rbinom(n, 1, 0.4 + 0.2 * (X[, 1] > 0)) Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n) tau.forest <- causal_forest(X, Y, W)

Estimate treatment effects for the training data using out-of-bag prediction.

tau.hat.oob <- predict(tau.forest) hist(tau.hat.oob$predictions)

Estimate treatment effects for the test sample.

tau.hat <- predict(tau.forest, X.test) plot(X.test[, 1], tau.hat$predictions, ylim = range(tau.hat$predictions, 0, 2), xlab = "x", ylab = "tau", type = "l") lines(X.test[, 1], pmax(0, X.test[, 1]), col = 2, lty = 2)

Estimate the conditional average treatment effect on the full sample (CATE).

averagetreatmenteffect(tau.forest, target.sample = "all")

Estimate the conditional average treatment effect on the treated sample (CATT).

averagetreatmenteffect(tau.forest, target.sample = "treated")

Add confidence intervals for heterogeneous treatment effects; growing more trees is now recommended.

tau.forest <- causal_forest(X, Y, W, num.trees = 4000) tau.hat <- predict(tau.forest, X.test, estimate.variance = TRUE) sigma.hat <- sqrt(tau.hat$variance.estimates) plot(X.test[, 1], tau.hat$predictions, ylim = range(tau.hat$predictions + 1.96 * sigma.hat, tau.hat$predictions - 1.96 * sigma.hat, 0, 2), xlab = "x", ylab = "tau", type = "l") lines(X.test[, 1], tau.hat$predictions + 1.96 * sigma.hat, col = 1, lty = 2) lines(X.test[, 1], tau.hat$predictions - 1.96 * sigma.hat, col = 1, lty = 2) lines(X.test[, 1], pmax(0, X.test[, 1]), col = 2, lty = 1)

In some examples, pre-fitting models for Y and W separately may

be helpful (e.g., if different models use different covariates).

In some applications, one may even want to get Y.hat and W.hat

using a completely different method (e.g., boosting).

Generate new data.

n <- 4000 p <- 20 X <- matrix(rnorm(n * p), n, p) TAU <- 1 / (1 + exp(-X[, 3])) W <- rbinom(n, 1, 1 / (1 + exp(-X[, 1] - X[, 2]))) Y <- pmax(X[, 2] + X[, 3], 0) + rowMeans(X[, 4:6]) / 2 + W * TAU + rnorm(n)

forest.W <- regression_forest(X, W, tune.parameters = "all") W.hat <- predict(forest.W)$predictions

forest.Y <- regression_forest(X, Y, tune.parameters = "all") Y.hat <- predict(forest.Y)$predictions

forest.Y.varimp <- variable_importance(forest.Y)

Note: Forests may have a hard time when trained on very few variables

(e.g., ncol(X) = 1, 2, or 3). We recommend not being too aggressive

in selection.

selected.vars <- which(forest.Y.varimp / mean(forest.Y.varimp) > 0.2)

tau.forest <- causal_forest(X[, selected.vars], Y, W, W.hat = W.hat, Y.hat = Y.hat, tune.parameters = "all")

See if a causal forest succeeded in capturing heterogeneity by plotting

the TOC and calculating a 95% CI for the AUTOC.

train <- sample(1:n, n / 2) train.forest <- causalforest(X[train, ], Y[train], W[train]) eval.forest <- causalforest(X[-train, ], Y[-train], W[-train]) rate <- rankaveragetreatment_effect(eval.forest, predict(train.forest, X[-train, ])$predictions) plot(rate) paste("AUTOC:", round(rate$estimate, 2), "+/", round(1.96 * rate$std.err, 2)) ```

Developing

In addition to providing out-of-the-box forests for quantile regression and causal effect estimation, GRF provides a framework for creating forests tailored to new statistical tasks. If you'd like to develop using GRF, please consult the algorithm reference and development guide.

Funding

Development of GRF is supported by the National Institutes of Health, the National Science Foundation, the Sloan Foundation, the Office of Naval Research (Grant N00014-17-1-2131) and Schmidt Futures.

References

Susan Athey and Stefan Wager. Estimating Treatment Effects with Causal Forests: An Application. Observational Studies, 5, 2019. [paper, arxiv]

Susan Athey, Julie Tibshirani and Stefan Wager. Generalized Random Forests. Annals of Statistics, 47(2), 2019. [paper, arxiv]

Yifan Cui, Michael R. Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu. Estimating Heterogeneous Treatment Effects with Right-Censored Data via Causal Survival Forests. Journal of the Royal Statistical Society: Series B, 85(2), 2023. [paper, arxiv]

Rina Friedberg, Julie Tibshirani, Susan Athey, and Stefan Wager. Local Linear Forests. Journal of Computational and Graphical Statistics, 30(2), 2020. [paper, arxiv]

Imke Mayer, Erik Sverdrup, Tobias Gauss, Jean-Denis Moyer, Stefan Wager and Julie Josse. Doubly Robust Treatment Effect Estimation with Missing Attributes. Annals of Applied Statistics, 14(3), 2020. [paper, arxiv]

Erik Sverdrup, Maria Petukhova, and Stefan Wager. Estimating Treatment Effect Heterogeneity in Psychiatry: A Review and Tutorial with Causal Forests. International Journal of Methods in Psychiatric Research, 34(2), 2025. [paper, arxiv]

Stefan Wager. Causal Inference: A Statistical Learning Approach. 2024. [pdf]

Stefan Wager and Susan Athey. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association, 113(523), 2018. [paper, arxiv]

Steve Yadlowsky, Scott Fleming, Nigam Shah, Emma Brunskill, and Stefan Wager. Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects. Journal of the American Statistical Association, 120(549), 2025. [paper, arxiv]

Owner

  • Name: GRF Labs
  • Login: grf-labs
  • Kind: organization
  • Location: Stanford University

GitHub Events

Total
  • Issues event: 35
  • Watch event: 80
  • Delete event: 5
  • Issue comment event: 45
  • Push event: 11
  • Pull request review comment event: 2
  • Pull request review event: 2
  • Pull request event: 22
  • Fork event: 12
  • Create event: 3
Last Year
  • Issues event: 35
  • Watch event: 80
  • Delete event: 5
  • Issue comment event: 45
  • Push event: 11
  • Pull request review comment event: 2
  • Pull request review event: 2
  • Pull request event: 22
  • Fork event: 12
  • Create event: 3

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 1,771
  • Total Committers: 32
  • Avg Commits per committer: 55.344
  • Development Distribution Score (DDS): 0.699
Past Year
  • Commits: 95
  • Committers: 3
  • Avg Commits per committer: 31.667
  • Development Distribution Score (DDS): 0.021
Top Committers
Name Email Commits
Erik Sverdrup e****p@g****m 533
Marvin Wright w****t@i****e 490
Julie Tibshirani j****s@g****m 404
Stefan Wager s****r@s****u 96
Julie Tibshirani j****s@c****u 95
Stefan Wager s****r 30
Julie Tibshirani j****i@e****o 27
Vitor Hadad h****d@g****m 26
Rina Friedberg r****g@g****m 15
animusnaturae b****s@s****e 7
Luke Miner l****r 7
Edward Gan e****8@g****m 5
davidahirshberg d****g@g****m 5
ras44 9****4 5
Jinhua Wang s****r@m****n 4
Stefan Wager s****n@J****l 3
Buyan b****t@u****u 2
jakobzeitler j****9@e****k 2
Evan Munro e****o@g****m 2
imkemayer m****e@g****m 1
Maximilian Haupt m****l@m****m 1
jjchern j****n@g****m 1
Nan Xiao n****s@g****m 1
Marvin N. Wright w****k@w****e 1
Scott Fleming s****n@g****m 1
Oliver Keyes I****s 1
rugilmartin 4****n 1
Max Ghenis m****s@g****m 1
Kendon Bell k****B 1
Benjamin Skinner b****r@v****u 1
and 2 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 150
  • Total pull requests: 176
  • Average time to close issues: 7 months
  • Average time to close pull requests: 3 days
  • Total issue authors: 111
  • Total pull request authors: 9
  • Average comments per issue: 2.27
  • Average comments per pull request: 0.19
  • Merged pull requests: 167
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 42
  • Pull requests: 33
  • Average time to close issues: 9 days
  • Average time to close pull requests: 2 days
  • Issue authors: 34
  • Pull request authors: 3
  • Average comments per issue: 1.12
  • Average comments per pull request: 0.24
  • Merged pull requests: 31
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • erikcs (10)
  • LuqianSun (4)
  • aczura2024 (4)
  • Jinwoo-Yi (3)
  • kmoeltner (3)
  • tibshirani (2)
  • njawadekar (2)
  • RamirezAmayaS (2)
  • ras44 (2)
  • minhengw (2)
  • corydeburd (2)
  • nikosGeography (2)
  • MCKnaus (2)
  • anirudhtomer (2)
  • austindenteh (2)
Pull Request Authors
  • erikcs (203)
  • jtibshirani (2)
  • bcjaeger (2)
  • apoorvalal (2)
  • mollyow (2)
  • sebp (1)
  • beniaminogreen (1)
  • nanxstats (1)
  • scottfleming (1)
Top Labels
Issue Labels
question (71) feature (7) documentation (7) bug (4) requires research (2) help wanted (1) performance (1)
Pull Request Labels
bug (6) breaking (3) performance (1)

Packages

  • Total packages: 3
  • Total downloads:
    • cran 6,270 last-month
  • Total docker downloads: 43,440
  • Total dependent packages: 12
    (may contain duplicates)
  • Total dependent repositories: 23
    (may contain duplicates)
  • Total versions: 60
  • Total maintainers: 1
cran.r-project.org: grf

Generalized Random Forests

  • Versions: 24
  • Dependent Packages: 11
  • Dependent Repositories: 21
  • Downloads: 6,270 Last month
  • Docker Downloads: 43,440
Rankings
Forks count: 0.2%
Stargazers count: 0.3%
Docker downloads count: 0.6%
Average: 3.0%
Dependent packages count: 5.0%
Downloads: 5.7%
Dependent repos count: 6.0%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/grf-labs/grf
  • Versions: 24
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 6 months ago
conda-forge.org: r-grf
  • Versions: 12
  • Dependent Packages: 1
  • Dependent Repositories: 2
Rankings
Forks count: 11.8%
Stargazers count: 14.1%
Average: 18.8%
Dependent repos count: 20.3%
Dependent packages count: 29.0%
Last synced: 6 months ago

Dependencies

r-package/grf/DESCRIPTION cran
  • R >= 3.5.0 depends
  • DiceKriging * imports
  • Matrix * imports
  • Rcpp >= 0.12.15 imports
  • lmtest * imports
  • methods * imports
  • sandwich >= 2.4 imports
  • DiagrammeR * suggests
  • MASS * suggests
  • rdd * suggests
  • survival >= 3.2 suggests
  • testthat >= 3.0.4 suggests