Science Score: 59.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
7 of 32 committers (21.9%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.5%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Generalized Random Forests
Basic Info
- Host: GitHub
- Owner: grf-labs
- License: gpl-3.0
- Language: C++
- Default Branch: master
- Homepage: https://grf-labs.github.io/grf/
- Size: 62.7 MB
Statistics
- Stars: 1,041
- Watchers: 46
- Forks: 264
- Open Issues: 57
- Releases: 0
Topics
Metadata Files
README.md
generalized random forests 
A package for forest-based statistical estimation and inference. GRF provides non-parametric methods for heterogeneous treatment effects estimation (optionally using right-censored outcomes, multiple treatment arms or outcomes, or instrumental variables), as well as least-squares regression, quantile regression, and survival regression, all with support for missing covariates.
In addition, GRF supports 'honest' estimation (where one subset of the data is used for choosing splits, and another for populating the leaves of the tree), and confidence intervals for least-squares regression and treatment effect estimation.
Some helpful links for getting started:
- The R package documentation contains usage examples and method reference.
- The GRF reference gives a detailed description of the GRF algorithm and includes troubleshooting suggestions.
- For community questions and answers around usage, see Github issues labelled 'question'.
The repository first started as a fork of the ranger repository -- we owe a great deal of thanks to the ranger authors for their useful and free package.
Installation
The latest release of the package can be installed through CRAN:
R
install.packages("grf")
conda users can install from the conda-forge channel:
conda install -c conda-forge r-grf
The current development version can be installed from source using devtools.
R
devtools::install_github("grf-labs/grf", subdir = "r-package/grf")
Note that to install from source, a compiler that implements C++11 or later is required. If installing on Windows, the RTools toolchain is also required.
Usage Examples
The following script demonstrates how to use GRF for heterogeneous treatment effect estimation. For examples of how to use other types of forests, please consult the R documentation on the relevant methods.
```R library(grf)
Generate data.
n <- 2000 p <- 10 X <- matrix(rnorm(n * p), n, p) X.test <- matrix(0, 101, p) X.test[, 1] <- seq(-2, 2, length.out = 101)
Train a causal forest.
W <- rbinom(n, 1, 0.4 + 0.2 * (X[, 1] > 0)) Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n) tau.forest <- causal_forest(X, Y, W)
Estimate treatment effects for the training data using out-of-bag prediction.
tau.hat.oob <- predict(tau.forest) hist(tau.hat.oob$predictions)
Estimate treatment effects for the test sample.
tau.hat <- predict(tau.forest, X.test) plot(X.test[, 1], tau.hat$predictions, ylim = range(tau.hat$predictions, 0, 2), xlab = "x", ylab = "tau", type = "l") lines(X.test[, 1], pmax(0, X.test[, 1]), col = 2, lty = 2)
Estimate the conditional average treatment effect on the full sample (CATE).
averagetreatmenteffect(tau.forest, target.sample = "all")
Estimate the conditional average treatment effect on the treated sample (CATT).
averagetreatmenteffect(tau.forest, target.sample = "treated")
Add confidence intervals for heterogeneous treatment effects; growing more trees is now recommended.
tau.forest <- causal_forest(X, Y, W, num.trees = 4000) tau.hat <- predict(tau.forest, X.test, estimate.variance = TRUE) sigma.hat <- sqrt(tau.hat$variance.estimates) plot(X.test[, 1], tau.hat$predictions, ylim = range(tau.hat$predictions + 1.96 * sigma.hat, tau.hat$predictions - 1.96 * sigma.hat, 0, 2), xlab = "x", ylab = "tau", type = "l") lines(X.test[, 1], tau.hat$predictions + 1.96 * sigma.hat, col = 1, lty = 2) lines(X.test[, 1], tau.hat$predictions - 1.96 * sigma.hat, col = 1, lty = 2) lines(X.test[, 1], pmax(0, X.test[, 1]), col = 2, lty = 1)
In some examples, pre-fitting models for Y and W separately may
be helpful (e.g., if different models use different covariates).
In some applications, one may even want to get Y.hat and W.hat
using a completely different method (e.g., boosting).
Generate new data.
n <- 4000 p <- 20 X <- matrix(rnorm(n * p), n, p) TAU <- 1 / (1 + exp(-X[, 3])) W <- rbinom(n, 1, 1 / (1 + exp(-X[, 1] - X[, 2]))) Y <- pmax(X[, 2] + X[, 3], 0) + rowMeans(X[, 4:6]) / 2 + W * TAU + rnorm(n)
forest.W <- regression_forest(X, W, tune.parameters = "all") W.hat <- predict(forest.W)$predictions
forest.Y <- regression_forest(X, Y, tune.parameters = "all") Y.hat <- predict(forest.Y)$predictions
forest.Y.varimp <- variable_importance(forest.Y)
Note: Forests may have a hard time when trained on very few variables
(e.g., ncol(X) = 1, 2, or 3). We recommend not being too aggressive
in selection.
selected.vars <- which(forest.Y.varimp / mean(forest.Y.varimp) > 0.2)
tau.forest <- causal_forest(X[, selected.vars], Y, W, W.hat = W.hat, Y.hat = Y.hat, tune.parameters = "all")
See if a causal forest succeeded in capturing heterogeneity by plotting
the TOC and calculating a 95% CI for the AUTOC.
train <- sample(1:n, n / 2) train.forest <- causalforest(X[train, ], Y[train], W[train]) eval.forest <- causalforest(X[-train, ], Y[-train], W[-train]) rate <- rankaveragetreatment_effect(eval.forest, predict(train.forest, X[-train, ])$predictions) plot(rate) paste("AUTOC:", round(rate$estimate, 2), "+/", round(1.96 * rate$std.err, 2)) ```
Developing
In addition to providing out-of-the-box forests for quantile regression and causal effect estimation, GRF provides a framework for creating forests tailored to new statistical tasks. If you'd like to develop using GRF, please consult the algorithm reference and development guide.
Funding
Development of GRF is supported by the National Institutes of Health, the National Science Foundation, the Sloan Foundation, the Office of Naval Research (Grant N00014-17-1-2131) and Schmidt Futures.
References
Susan Athey and Stefan Wager. Estimating Treatment Effects with Causal Forests: An Application. Observational Studies, 5, 2019. [paper, arxiv]
Susan Athey, Julie Tibshirani and Stefan Wager. Generalized Random Forests. Annals of Statistics, 47(2), 2019. [paper, arxiv]
Yifan Cui, Michael R. Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu. Estimating Heterogeneous Treatment Effects with Right-Censored Data via Causal Survival Forests. Journal of the Royal Statistical Society: Series B, 85(2), 2023. [paper, arxiv]
Rina Friedberg, Julie Tibshirani, Susan Athey, and Stefan Wager. Local Linear Forests. Journal of Computational and Graphical Statistics, 30(2), 2020. [paper, arxiv]
Imke Mayer, Erik Sverdrup, Tobias Gauss, Jean-Denis Moyer, Stefan Wager and Julie Josse. Doubly Robust Treatment Effect Estimation with Missing Attributes. Annals of Applied Statistics, 14(3), 2020. [paper, arxiv]
Erik Sverdrup, Maria Petukhova, and Stefan Wager. Estimating Treatment Effect Heterogeneity in Psychiatry: A Review and Tutorial with Causal Forests. International Journal of Methods in Psychiatric Research, 34(2), 2025. [paper, arxiv]
Stefan Wager. Causal Inference: A Statistical Learning Approach. 2024. [pdf]
Stefan Wager and Susan Athey. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association, 113(523), 2018. [paper, arxiv]
Steve Yadlowsky, Scott Fleming, Nigam Shah, Emma Brunskill, and Stefan Wager. Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects. Journal of the American Statistical Association, 120(549), 2025. [paper, arxiv]
Owner
- Name: GRF Labs
- Login: grf-labs
- Kind: organization
- Location: Stanford University
- Website: https://grf-labs.github.io/grf/
- Repositories: 4
- Profile: https://github.com/grf-labs
GitHub Events
Total
- Issues event: 35
- Watch event: 80
- Delete event: 5
- Issue comment event: 45
- Push event: 11
- Pull request review comment event: 2
- Pull request review event: 2
- Pull request event: 22
- Fork event: 12
- Create event: 3
Last Year
- Issues event: 35
- Watch event: 80
- Delete event: 5
- Issue comment event: 45
- Push event: 11
- Pull request review comment event: 2
- Pull request review event: 2
- Pull request event: 22
- Fork event: 12
- Create event: 3
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Erik Sverdrup | e****p@g****m | 533 |
| Marvin Wright | w****t@i****e | 490 |
| Julie Tibshirani | j****s@g****m | 404 |
| Stefan Wager | s****r@s****u | 96 |
| Julie Tibshirani | j****s@c****u | 95 |
| Stefan Wager | s****r | 30 |
| Julie Tibshirani | j****i@e****o | 27 |
| Vitor Hadad | h****d@g****m | 26 |
| Rina Friedberg | r****g@g****m | 15 |
| animusnaturae | b****s@s****e | 7 |
| Luke Miner | l****r | 7 |
| Edward Gan | e****8@g****m | 5 |
| davidahirshberg | d****g@g****m | 5 |
| ras44 | 9****4 | 5 |
| Jinhua Wang | s****r@m****n | 4 |
| Stefan Wager | s****n@J****l | 3 |
| Buyan | b****t@u****u | 2 |
| jakobzeitler | j****9@e****k | 2 |
| Evan Munro | e****o@g****m | 2 |
| imkemayer | m****e@g****m | 1 |
| Maximilian Haupt | m****l@m****m | 1 |
| jjchern | j****n@g****m | 1 |
| Nan Xiao | n****s@g****m | 1 |
| Marvin N. Wright | w****k@w****e | 1 |
| Scott Fleming | s****n@g****m | 1 |
| Oliver Keyes | I****s | 1 |
| rugilmartin | 4****n | 1 |
| Max Ghenis | m****s@g****m | 1 |
| Kendon Bell | k****B | 1 |
| Benjamin Skinner | b****r@v****u | 1 |
| and 2 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 150
- Total pull requests: 176
- Average time to close issues: 7 months
- Average time to close pull requests: 3 days
- Total issue authors: 111
- Total pull request authors: 9
- Average comments per issue: 2.27
- Average comments per pull request: 0.19
- Merged pull requests: 167
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 42
- Pull requests: 33
- Average time to close issues: 9 days
- Average time to close pull requests: 2 days
- Issue authors: 34
- Pull request authors: 3
- Average comments per issue: 1.12
- Average comments per pull request: 0.24
- Merged pull requests: 31
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- erikcs (10)
- LuqianSun (4)
- aczura2024 (4)
- Jinwoo-Yi (3)
- kmoeltner (3)
- tibshirani (2)
- njawadekar (2)
- RamirezAmayaS (2)
- ras44 (2)
- minhengw (2)
- corydeburd (2)
- nikosGeography (2)
- MCKnaus (2)
- anirudhtomer (2)
- austindenteh (2)
Pull Request Authors
- erikcs (203)
- jtibshirani (2)
- bcjaeger (2)
- apoorvalal (2)
- mollyow (2)
- sebp (1)
- beniaminogreen (1)
- nanxstats (1)
- scottfleming (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- cran 6,270 last-month
- Total docker downloads: 43,440
-
Total dependent packages: 12
(may contain duplicates) -
Total dependent repositories: 23
(may contain duplicates) - Total versions: 60
- Total maintainers: 1
cran.r-project.org: grf
Generalized Random Forests
- Homepage: https://github.com/grf-labs/grf
- Documentation: http://cran.r-project.org/web/packages/grf/grf.pdf
- License: GPL-3
-
Latest release: 2.4.0
published over 1 year ago
Rankings
Maintainers (1)
proxy.golang.org: github.com/grf-labs/grf
- Documentation: https://pkg.go.dev/github.com/grf-labs/grf#section-documentation
- License: gpl-3.0
-
Latest release: v2.4.0+incompatible
published over 1 year ago
Rankings
conda-forge.org: r-grf
- Homepage: https://github.com/grf-labs/grf
- License: GPL-3.0-only
-
Latest release: 2.2.0
published over 3 years ago
Rankings
Dependencies
- R >= 3.5.0 depends
- DiceKriging * imports
- Matrix * imports
- Rcpp >= 0.12.15 imports
- lmtest * imports
- methods * imports
- sandwich >= 2.4 imports
- DiagrammeR * suggests
- MASS * suggests
- rdd * suggests
- survival >= 3.2 suggests
- testthat >= 3.0.4 suggests