grf

Generalized Random Forests

https://github.com/grf-labs/grf

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 9 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
7 of 32 committers (21.9%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.5%) to scientific vocabulary

Keywords

causal-forest causal-inference econometrics machine-learning random-forest statistics

Keywords from Contributors

date-time

Last synced: 6 months ago · JSON representation

Repository

Generalized Random Forests

Basic Info

Host: GitHub
Owner: grf-labs
License: gpl-3.0
Language: C++
Default Branch: master
Homepage: https://grf-labs.github.io/grf/
Size: 62.7 MB

Statistics

Stars: 1,041
Watchers: 46
Forks: 264
Open Issues: 57
Releases: 0

Topics

causal-forest causal-inference econometrics machine-learning random-forest statistics

Created over 9 years ago · Last pushed 7 months ago

Metadata Files

Readme License

generalized random forests

A package for forest-based statistical estimation and inference. GRF provides non-parametric methods for heterogeneous treatment effects estimation (optionally using right-censored outcomes, multiple treatment arms or outcomes, or instrumental variables), as well as least-squares regression, quantile regression, and survival regression, all with support for missing covariates.

In addition, GRF supports 'honest' estimation (where one subset of the data is used for choosing splits, and another for populating the leaves of the tree), and confidence intervals for least-squares regression and treatment effect estimation.

Some helpful links for getting started:

The R package documentation contains usage examples and method reference.
The GRF reference gives a detailed description of the GRF algorithm and includes troubleshooting suggestions.
For community questions and answers around usage, see Github issues labelled 'question'.

The repository first started as a fork of the ranger repository -- we owe a great deal of thanks to the ranger authors for their useful and free package.

Installation

The latest release of the package can be installed through CRAN:

R install.packages("grf")

conda users can install from the conda-forge channel:

conda install -c conda-forge r-grf

The current development version can be installed from source using devtools.

R devtools::install_github("grf-labs/grf", subdir = "r-package/grf")

Note that to install from source, a compiler that implements C++11 or later is required. If installing on Windows, the RTools toolchain is also required.

Usage Examples

The following script demonstrates how to use GRF for heterogeneous treatment effect estimation. For examples of how to use other types of forests, please consult the R documentation on the relevant methods.

```R library(grf)

Generate data.

n <- 2000 p <- 10 X <- matrix(rnorm(n * p), n, p) X.test <- matrix(0, 101, p) X.test[, 1] <- seq(-2, 2, length.out = 101)

Train a causal forest.

W <- rbinom(n, 1, 0.4 + 0.2 * (X[, 1] > 0)) Y <- pmax(X[, 1], 0) * W + X[, 2] + pmin(X[, 3], 0) + rnorm(n) tau.forest <- causal_forest(X, Y, W)

Estimate treatment effects for the training data using out-of-bag prediction.

tau.hat.oob <- predict(tau.forest) hist(tau.hat.oob$predictions)

Estimate treatment effects for the test sample.

tau.hat <- predict(tau.forest, X.test) plot(X.test[, 1], tau.hat$predictions, ylim = range(tau.hat$predictions, 0, 2), xlab = "x", ylab = "tau", type = "l") lines(X.test[, 1], pmax(0, X.test[, 1]), col = 2, lty = 2)

Estimate the conditional average treatment effect on the full sample (CATE).

averagetreatmenteffect(tau.forest, target.sample = "all")

Estimate the conditional average treatment effect on the treated sample (CATT).

averagetreatmenteffect(tau.forest, target.sample = "treated")

Add confidence intervals for heterogeneous treatment effects; growing more trees is now recommended.

tau.forest <- causal_forest(X, Y, W, num.trees = 4000) tau.hat <- predict(tau.forest, X.test, estimate.variance = TRUE) sigma.hat <- sqrt(tau.hat$variance.estimates) plot(X.test[, 1], tau.hat$predictions, ylim = range(tau.hat$predictions + 1.96 * sigma.hat, tau.hat$predictions - 1.96 * sigma.hat, 0, 2), xlab = "x", ylab = "tau", type = "l") lines(X.test[, 1], tau.hat$predictions + 1.96 * sigma.hat, col = 1, lty = 2) lines(X.test[, 1], tau.hat$predictions - 1.96 * sigma.hat, col = 1, lty = 2) lines(X.test[, 1], pmax(0, X.test[, 1]), col = 2, lty = 1)

In some examples, pre-fitting models for Y and W separately may

be helpful (e.g., if different models use different covariates).

In some applications, one may even want to get Y.hat and W.hat

using a completely different method (e.g., boosting).

Generate new data.

n <- 4000 p <- 20 X <- matrix(rnorm(n * p), n, p) TAU <- 1 / (1 + exp(-X[, 3])) W <- rbinom(n, 1, 1 / (1 + exp(-X[, 1] - X[, 2]))) Y <- pmax(X[, 2] + X[, 3], 0) + rowMeans(X[, 4:6]) / 2 + W * TAU + rnorm(n)

forest.W <- regression_forest(X, W, tune.parameters = "all") W.hat <- predict(forest.W)$predictions

forest.Y <- regression_forest(X, Y, tune.parameters = "all") Y.hat <- predict(forest.Y)$predictions

forest.Y.varimp <- variable_importance(forest.Y)

Note: Forests may have a hard time when trained on very few variables

(e.g., ncol(X) = 1, 2, or 3). We recommend not being too aggressive

in selection.

selected.vars <- which(forest.Y.varimp / mean(forest.Y.varimp) > 0.2)

tau.forest <- causal_forest(X[, selected.vars], Y, W, W.hat = W.hat, Y.hat = Y.hat, tune.parameters = "all")

See if a causal forest succeeded in capturing heterogeneity by plotting

the TOC and calculating a 95% CI for the AUTOC.

train <- sample(1:n, n / 2) train.forest <- causalforest(X[train, ], Y[train], W[train]) eval.forest <- causalforest(X[-train, ], Y[-train], W[-train]) rate <- rankaveragetreatment_effect(eval.forest, predict(train.forest, X[-train, ])$predictions) plot(rate) paste("AUTOC:", round(rate$estimate, 2), "+/", round(1.96 * rate$std.err, 2)) ```

Developing

In addition to providing out-of-the-box forests for quantile regression and causal effect estimation, GRF provides a framework for creating forests tailored to new statistical tasks. If you'd like to develop using GRF, please consult the algorithm reference and development guide.

Funding

Development of GRF is supported by the National Institutes of Health, the National Science Foundation, the Sloan Foundation, the Office of Naval Research (Grant N00014-17-1-2131) and Schmidt Futures.

References

Susan Athey and Stefan Wager. Estimating Treatment Effects with Causal Forests: An Application. Observational Studies, 5, 2019. [paper, arxiv]

Susan Athey, Julie Tibshirani and Stefan Wager. Generalized Random Forests. Annals of Statistics, 47(2), 2019. [paper, arxiv]

Yifan Cui, Michael R. Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu. Estimating Heterogeneous Treatment Effects with Right-Censored Data via Causal Survival Forests. Journal of the Royal Statistical Society: Series B, 85(2), 2023. [paper, arxiv]

Rina Friedberg, Julie Tibshirani, Susan Athey, and Stefan Wager. Local Linear Forests. Journal of Computational and Graphical Statistics, 30(2), 2020. [paper, arxiv]

Imke Mayer, Erik Sverdrup, Tobias Gauss, Jean-Denis Moyer, Stefan Wager and Julie Josse. Doubly Robust Treatment Effect Estimation with Missing Attributes. Annals of Applied Statistics, 14(3), 2020. [paper, arxiv]

Erik Sverdrup, Maria Petukhova, and Stefan Wager. Estimating Treatment Effect Heterogeneity in Psychiatry: A Review and Tutorial with Causal Forests. International Journal of Methods in Psychiatric Research, 34(2), 2025. [paper, arxiv]

Stefan Wager. Causal Inference: A Statistical Learning Approach. 2024. [pdf]

Stefan Wager and Susan Athey. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association, 113(523), 2018. [paper, arxiv]

Steve Yadlowsky, Scott Fleming, Nigam Shah, Emma Brunskill, and Stefan Wager. Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects. Journal of the American Statistical Association, 120(549), 2025. [paper, arxiv]

Owner

Name: GRF Labs
Login: grf-labs
Kind: organization
Location: Stanford University

Website: https://grf-labs.github.io/grf/
Repositories: 4
Profile: https://github.com/grf-labs

GitHub Events

Total

Issues event: 35
Watch event: 80
Delete event: 5
Issue comment event: 45
Push event: 11
Pull request review comment event: 2
Pull request review event: 2
Pull request event: 22
Fork event: 12
Create event: 3

Last Year

Issues event: 35
Watch event: 80
Delete event: 5
Issue comment event: 45
Push event: 11
Pull request review comment event: 2
Pull request review event: 2
Pull request event: 22
Fork event: 12
Create event: 3

Committers

Last synced: over 2 years ago

All Time

Total Commits: 1,771
Total Committers: 32
Avg Commits per committer: 55.344
Development Distribution Score (DDS): 0.699

Past Year

Commits: 95
Committers: 3
Avg Commits per committer: 31.667
Development Distribution Score (DDS): 0.021

Top Committers

Name	Email	Commits
Erik Sverdrup	e**p@g**m	533
Marvin Wright	w**t@i**e	490
Julie Tibshirani	j**s@g**m	404
Stefan Wager	s**r@s**u	96
Julie Tibshirani	j**s@c**u	95
Stefan Wager	s****r	30
Julie Tibshirani	j**i@e**o	27
Vitor Hadad	h**d@g**m	26
Rina Friedberg	r**g@g**m	15
animusnaturae	b**s@s**e	7
Luke Miner	l****r	7
Edward Gan	e**8@g**m	5
davidahirshberg	d**g@g**m	5
ras44	9****4	5
Jinhua Wang	s**r@m**n	4
Stefan Wager	s**n@J**l	3
Buyan	b**t@u**u	2
jakobzeitler	j**9@e**k	2
Evan Munro	e**o@g**m	2
imkemayer	m**e@g**m	1
Maximilian Haupt	m**l@m**m	1
jjchern	j**n@g**m	1
Nan Xiao	n**s@g**m	1
Marvin N. Wright	w**k@w**e	1
Scott Fleming	s**n@g**m	1
Oliver Keyes	I****s	1
rugilmartin	4****n	1
Max Ghenis	m**s@g**m	1
Kendon Bell	k****B	1
Benjamin Skinner	b**r@v**u	1
and 2 more...

Committer Domains (Top 20 + Academic)

wrig.de: 2 virginia.edu: 1 maximilianhaupt.com: 1 exeter.ac.uk: 1 umass.edu: 1 msn.cn: 1 student.uni-luebeck.de: 1 elastic.co: 1 cs.stanford.edu: 1 stanford.edu: 1 imbs.uni-luebeck.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 150
Total pull requests: 176
Average time to close issues: 7 months
Average time to close pull requests: 3 days
Total issue authors: 111
Total pull request authors: 9
Average comments per issue: 2.27
Average comments per pull request: 0.19
Merged pull requests: 167
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 42
Pull requests: 33
Average time to close issues: 9 days
Average time to close pull requests: 2 days
Issue authors: 34
Pull request authors: 3
Average comments per issue: 1.12
Average comments per pull request: 0.24
Merged pull requests: 31
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

erikcs (10)
LuqianSun (4)
aczura2024 (4)
Jinwoo-Yi (3)
kmoeltner (3)
tibshirani (2)
njawadekar (2)
RamirezAmayaS (2)
ras44 (2)
minhengw (2)
corydeburd (2)
nikosGeography (2)
MCKnaus (2)
anirudhtomer (2)
austindenteh (2)

Pull Request Authors

erikcs (203)
jtibshirani (2)
bcjaeger (2)
apoorvalal (2)
mollyow (2)
sebp (1)
beniaminogreen (1)
nanxstats (1)
scottfleming (1)

Top Labels

Issue Labels

question (71) feature (7) documentation (7) bug (4) requires research (2) help wanted (1) performance (1)

Pull Request Labels

bug (6) breaking (3) performance (1)

Packages

Total packages: 3
Total downloads:
- cran 6,270 last-month
Total docker downloads: 43,440

Total dependent packages: 12
(may contain duplicates)
Total dependent repositories: 23
(may contain duplicates)
Total versions: 60
Total maintainers: 1

cran.r-project.org: grf

Generalized Random Forests

Homepage: https://github.com/grf-labs/grf
Documentation: http://cran.r-project.org/web/packages/grf/grf.pdf
License: GPL-3
Latest release: 2.4.0
published over 1 year ago

Versions: 24
Dependent Packages: 11
Dependent Repositories: 21
Downloads: 6,270 Last month
Docker Downloads: 43,440

Rankings

Forks count: 0.2%

Stargazers count: 0.3%

Docker downloads count: 0.6%

Average: 3.0%

Dependent packages count: 5.0%

Downloads: 5.7%

Dependent repos count: 6.0%

Maintainers (1)

erik.sverdrup@monash.edu

Last synced: 6 months ago

proxy.golang.org: github.com/grf-labs/grf

Documentation: https://pkg.go.dev/github.com/grf-labs/grf#section-documentation
License: gpl-3.0
Latest release: v2.4.0+incompatible
published over 1 year ago

Versions: 24
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 6.5%

Average: 6.7%

Dependent repos count: 7.0%

Last synced: 6 months ago

conda-forge.org: r-grf

Homepage: https://github.com/grf-labs/grf
License: GPL-3.0-only
Latest release: 2.2.0
published over 3 years ago

Versions: 12
Dependent Packages: 1
Dependent Repositories: 2

Rankings

Forks count: 11.8%

Stargazers count: 14.1%

Average: 18.8%

Dependent repos count: 20.3%

Dependent packages count: 29.0%

Last synced: 6 months ago

Dependencies

r-package/grf/DESCRIPTION cran

R >= 3.5.0 depends
DiceKriging * imports
Matrix * imports
Rcpp >= 0.12.15 imports
lmtest * imports
methods * imports
sandwich >= 2.4 imports
DiagrammeR * suggests
MASS * suggests
rdd * suggests
survival >= 3.2 suggests
testthat >= 3.0.4 suggests

grf

Science Score: 59.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

generalized random forests

Installation

Usage Examples

Generate data.

Train a causal forest.

Estimate treatment effects for the training data using out-of-bag prediction.

Estimate treatment effects for the test sample.

Estimate the conditional average treatment effect on the full sample (CATE).

Estimate the conditional average treatment effect on the treated sample (CATT).

Add confidence intervals for heterogeneous treatment effects; growing more trees is now recommended.

In some examples, pre-fitting models for Y and W separately may

be helpful (e.g., if different models use different covariates).

In some applications, one may even want to get Y.hat and W.hat

using a completely different method (e.g., boosting).

Generate new data.

Note: Forests may have a hard time when trained on very few variables

(e.g., ncol(X) = 1, 2, or 3). We recommend not being too aggressive

in selection.

See if a causal forest succeeded in capturing heterogeneity by plotting

the TOC and calculating a 95% CI for the AUTOC.

Developing

Funding

References

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: grf

Rankings

Maintainers (1)

proxy.golang.org: github.com/grf-labs/grf

Rankings

conda-forge.org: r-grf

Rankings

Dependencies