SuperNOVA
SuperNOVA: Semi-Parametric Identification and Estimation of Interaction and Effect Modification in Mixed Exposures using Stochastic Interventions in R - Published in JOSS (2023)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
✓Committers with academic emails
1 of 6 committers (16.7%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
interactions
machine-learning
mixed-exposure
statistics
targeted-learning
variable-importance
Keywords from Contributors
causal-effects
causal-inference
decision-trees
exposure-mixtures
robust-statistics
Last synced: 6 months ago
·
JSON representation
Repository
:dizzy: :dart: Automatic identification of variable and interaction importance using basis functions and non-parametric estimation of interactions/effect modification using joint stochastic interventions.
Basic Info
Statistics
- Stars: 9
- Watchers: 2
- Forks: 1
- Open Issues: 2
- Releases: 2
Topics
interactions
machine-learning
mixed-exposure
statistics
targeted-learning
variable-importance
Created about 4 years ago
· Last pushed over 2 years ago
Metadata Files
Readme
Contributing
License
README.Rmd
---
output:
rmarkdown::github_document
bibliography: "inst/references.bib"
always_allow_html: true
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# R/`SuperNOVA`
[](https://github.com/blind-contours/SuperNOVA/actions)
[](https://codecov.io/github/blind-contours/SuperNOVA?branch=master)
[](https://www.r-pkg.org/pkg/SuperNOVA)
[](https://CRAN.R-project.org/package=SuperNOVA)
[](https://CRAN.R-project.org/package=SuperNOVA)
[](https://www.repostatus.org/#active)
[](https://opensource.org/licenses/MIT)
> Efficient Estimation of the Causal Effects of Non-Parametric Interactions, Effect Modifications and Mediation using Stochastic Interventions
__Authors:__ [David McCoy](https://davidmccoy.org)
---
## What's `SuperNOVA`?
The `SuperNOVA` R package offers a comprehensive toolset for identifying predictive variable sets, be it subsets of exposures, exposure-covariates, or exposure-mediators, for a specified outcome. It further assists in creating efficient estimators for the counterfactual mean of the outcome under stochastic interventions on these variable sets. This means making exposure changes that depend on naturally observed values, as described in past literature [@diaz2012population; @haneuse2013estimation].
`SuperNOVA` introduces several estimators, constructed based on the patterns found in the data. At present, semi-parametric estimators are available for various contexts: interaction, effect modification, marginal impacts, and mediation. The target parameters that this package calculates are grounded in previously published works: one regarding interaction and effect modification [@mccoy2023semiparametric] and another dedicated to mediation [McCoy2023mediation].
The `SuperNOVA` package builds upon the capabilities of the `txshift` package, which implements the TML estimator for a stochastic shift causal parameter [@diaz2012population]. A notable extension in SuperNOVA is its support for joint stochastic interventions on two exposures. This allows for the creation of a non-parametric interaction parameter, shedding light on the combined effect of concurrently shifting two variables relative to the cumulative effect of individual shifts. At this stage, the focus is on two-way shifts.
Various parameters are calculated based on identified patterns:
* Interaction Parameter: When two exposures show signs of interaction, this parameter is calculated.
* Individual Stochastic Interventions: When marginal effects are identified, this estimates the difference in outcomes upon making a specific shift compared to no intervention.
* Effect Modification: In scenarios where effect modification is evident, the target parameter for it is the mean outcome under intervention within specific regions of the covariate space that the data suggests. This contrasts with the marginal effect by diving deeper into strata of covariates, for example, gauging the effects of exposure shifts among distinct genders.
* Mediation: In scenarios where mediation is found, the target parameter are the natural direct and indirect effects as defined by stochastic intervention. That is, for the natural indirect effect, `SuperNOVA` estimates the impact of shifting an exposure through a mediator and the direct effect which is not through the mediator.
Mediation in `SuperNOVA` is a new feature. Here, `SuperNOVA` brings to the table estimates initially conceived in another work [@Diaz2020a], while also supporting mediation estimates for continuous exposures.
The package ensures robustness by employing a k-fold cross-validation framework. This framework helps in estimating a data-adaptive parameter, which is the stochastic shift target parameters for the variable sets identified as influential for the outcome. The process begins by partitioning the data into parameter-generating and estimation samples. The former sample assists in fitting a collection of basis function estimators to the data. The one with the lowest cross-validated mean squared error is selected. Key variable sets are then extracted using an ANOVA-like variance decomposition methodology. For the estimation sample, targeted learning is harnessed to gauge causal target parameters across different contexts: interaction, mediation, effect modification, and individual variable shifts.
By using SuperNOVA, users get access to a tool that offers both k-fold specific and aggregated results for each target parameter, ensuring that researchers can glean the most information from their data. For a more in-depth exploration, there's an accompanying vignette.
To utilize the package, users need to provide vectors for exposures, covariates, mediators, and outcomes. They also specify the respective $\delta$ for each exposure (indicating the degree of shift) and if this delta should be adaptive in response to positivity violations. A detailed guide is provided in the vignette. With these inputs, `SuperNOVA` processes the data and delivers tables showcasing fold-specific results and aggregated outcomes, allowing users to glean insights effectively.
`SuperNOVA` also incorporates features from the `sl3` package [@coyle-sl3-rpkg], facilitating ensemble machine learning in the estimation process. If the user does not specify any stack parameters, `SuperNOVA` will automatically create an ensemble of machine learning algorithms that strike a balance between flexibility and computational efficiency.
---
## Installation
*Note:* Because the `SuperNOVA` package (currently) depends on `sl3` that
allows ensemble machine learning to be used for nuisance parameter
estimation and `sl3` is not on CRAN the `SuperNOVA` package is not
available on CRAN and must be downloaded here.
There are many depedencies for `SuperNOVA` so it's easier to break up
installation of the various packages to ensure proper installation.
First install the basis estimators used in the data-adaptive
variable discovery of the exposure and covariate space:
```{r tree-packages, eval = FALSE}
install.packages("earth")
install.packages("hal9001")
```
`SuperNOVA` uses the `sl3` package to build ensemble machine learners for each nuisance parameter.
We have to install off the development branch, first download these two packages for `sl3`
```{r super-learner-packages, eval = FALSE}
install.packages(c("ranger", "arm", "xgboost", "nnls"))
```
Now install `sl3` on devel:
```{r sl3_devel, eval = FALSE}
remotes::install_github("tlverse/sl3@devel")
```
Make sure `sl3` installs correctly then install `SuperNOVA`
```{r SuperNOVA_install, eval = FALSE}
remotes::install_github("blind-contours/SuperNOVA@main")
```
`SuperNOVA` has some other miscellaneous dependencies that are used in the examples as well as in the plotting functions.
```{r msc_package, eval = FALSE}
install.packages(c("kableExtra", "hrbrthemes", "viridis"))
```
---
## Example
To illustrate how `SuperNOVA` may be used to ascertain the effect of a mixed exposure, consider the following example:
```{r example, warning=FALSE}
library(SuperNOVA)
library(devtools)
library(kableExtra)
library(sl3)
set.seed(429153)
# simulate simple data
n_obs <- 100000
```
The `simulate_data` is a function for generating synthetic data with a complex structure to study the causal effects of shifting values in the mixtures of exposures. The primary purpose of this simulation is to provide a controlled environment for testing and validating estimates given by the `SuperNOVA` package.
The simulate_data function generates synthetic data for n_obs observations with a pre-specified covariance structure (sigma_mod) and a shift parameter (delta). It simulates four mixture components (M1, M2, M3, M4) and three covariates (W1, W2, W3) with specific relationships between them. The outcome variable Y is generated as a function of these mixtures and covariates.
After generating the data, the function applies a shift (delta) to each mixture component separately and calculates the average treatment effect for each component. Additionally, it calculates the interaction effect of shifting two mixture components simultaneously (m14_intxn). These ground truth effects can be used for validating and comparing the performance of various causal inference methods on this synthetic dataset.
The function returns a list containing the generated data and the ground truth effects of the shifts applied to each mixture component, the interaction effect, and the modified effect results based on a specific level in the W3 covariate.
```{r simulate data, eval = TRUE}
sim_out <- simulate_data(n_obs = n_obs)
data <- sim_out$data
head(data) %>%
kbl(caption = "Simulated Data") %>%
kable_classic(full_width = F, html_font = "Cambria")
```
And therefore, in `SuperNOVA` we would expect most of the fold CIs to cover
this number and the pooled estimate to also cover this true effect. Let's
run `SuperNOVA` to see if it correctly identifies the exposures that drive
the outcome and any interaction/effect modification that exists in the DGP.
Of note, there are three exposures M1, M2, M3 - M1 and M3 have individual effects
and interactions that drive the outcome. There is also effect modification
between M3 and W1.
```{r run SuperNOVA, eval = TRUE, message=FALSE, warning=FALSE}
data_sample <- data[sample(nrow(data), 4000), ]
w <- data_sample[, c("W1", "W2", "W3")]
a <- data_sample[, c("M1", "M2", "M3", "M4")]
y <- data_sample$Y
deltas <- list("M1" = 1, "M2" = 1, "M3" = 1, "M4" = 1)
ptm <- proc.time()
sim_results <- SuperNOVA(
w = w,
a = a,
y = y,
delta = deltas,
n_folds = 3,
num_cores = 6,
outcome_type = "continuous",
quantile_thresh = 0,
seed = 294580
)
proc.time() - ptm
basis_in_folds <- sim_results$`Basis Fold Proportions`
indiv_shift_results <- sim_results$`Indiv Shift Results`
em_results <- sim_results$`Effect Mod Results`
joint_shift_results <- sim_results$`Joint Shift Results`
```
Let's first look at the variable relationships used in the folds:
```{r basis in folds}
basis_in_folds
```
The above list shows that marginal effects for exposures M1 and M4 were found, an interaction for M1 and M4, and effect modification for M3 and W3 - all of which are correct as the outcome is generated from these relationships. There is no effect for M2 which we correctly reject.
Let's first look at the results for individual stochastic shifts by delta compared to no shift:
```{r individual shift results}
indiv_shift_results$M1 %>%
kbl(caption = "Individual Stochastic Intervention Results for M1") %>%
kable_classic(full_width = F, html_font = "Cambria")
```
The true effect for a shifted M1 vs observed M1 is:
```{r m1 truth}
sim_out$m1_effect
```
And so we see that we have proper coverage.
Next we can look at effect modifications:
```{r effect modification results}
em_results$M3W3 %>%
kbl(caption = "Effect Modification Stochastic Intervention Results for M3 and W3") %>%
kable_classic(full_width = F, html_font = "Cambria")
```
Let's first look at the truth:
```{r em truth}
sim_out$effect_mod
```
When W3 is 1 the truth effect is 11, our estimates are 11 with CI coverage. When W3 is 0 the truth is 1, our estimate is 1.9 with CIs that cover the truth as well.
And finally results for the joint shift which is a joint shift compared to additive individual shifts.
```{r joint results}
joint_shift_results$M1M4 %>%
kbl(caption = "Interactions Stochastic Intervention Results for M1 and M4") %>%
kable_classic(full_width = F, html_font = "Cambria")
```
Let's look at the truth again:
```{r interaction truths}
sim_out$m1_effect
sim_out$m4_effect
sim_out$m14_effect
sim_out$m14_intxn
```
So comparing the results to the above table in the pooled section we can see all our estimates for the marginal shifts, dual shift, and difference between dual and sum of marginals have CIs that cover the truth.
---
## Issues
If you encounter any bugs or have any specific feature requests,
please [file an
issue](https://github.com/blind-contours/SuperNOVA/issues). Further details
on filing
issues are provided in our [contribution
guidelines](https://github.com/blind-contours/
SuperNOVA/main/contributing.md).
---
## Contributions
Contributions are very welcome. Interested contributors should consult our
[contribution
guidelines](https://github.com/blind-contours/SuperNOVA/blob/master/CONTRIBUTING.md)
prior to submitting a pull request.
---
## Citation
After using the `SuperNOVA` R package, please cite the following:
---
## Related
* [R/`tmle3shift`](https://github.com/tlverse/tmle3shift) - An R package
providing an independent implementation of the same core routines for the TML
estimation procedure and statistical methodology as is made available here,
through reliance on a unified interface for Targeted Learning provided by the
[`tmle3`](https://github.com/tlverse/tmle3) engine of the [`tlverse`
ecosystem](https://github.com/tlverse).
* [R/`medshift`](https://github.com/nhejazi/medshift) - An R package providing
facilities to estimate the causal effect of stochastic treatment regimes in
the mediation setting, including classical (IPW) and augmented double robust
(one-step) estimators. This is an implementation of the methodology explored
by @diaz2020causal.
* [R/`haldensify`](https://github.com/nhejazi/haldensify) - A minimal package
for estimating the conditional density treatment mechanism component of this
parameter based on using the [highly adaptive
lasso](https://github.com/tlverse/hal9001) [@coyle-hal9001-rpkg;
@hejazi2020hal9001-joss] in combination with a pooled hazard regression. This
package implements a variant of the approach advocated by @diaz2011super.
---
## Funding
The development of this software was supported in part through grants from the
---
## License
© 2020-2022 [David B. McCoy](https://davidmccoy.org)
The contents of this repository are distributed under the MIT license. See below
for details:
```
MIT License
Copyright (c) 2020-2022 David B. McCoy
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```
---
## References
Owner
- Name: David McCoy
- Login: blind-contours
- Kind: user
- Repositories: 3
- Profile: https://github.com/blind-contours
JOSS Publication
SuperNOVA: Semi-Parametric Identification and Estimation of Interaction and Effect Modification in Mixed Exposures using Stochastic Interventions in R
Published
November 05, 2023
Volume 8, Issue 91, Page 5422
Authors
David McCoy
Department of Biostatistics, University of California Berkeley, Berkeley, CA 94704, U.S.A.
Department of Biostatistics, University of California Berkeley, Berkeley, CA 94704, U.S.A.
Alejandro Schuler
Department of Biostatistics, University of California Berkeley, Berkeley, CA 94704, U.S.A.
Department of Biostatistics, University of California Berkeley, Berkeley, CA 94704, U.S.A.
Tags
causal inference machine learning stochastic interventions efficient estimation targeted learning mixed exposuresGitHub Events
Total
- Issues event: 1
- Watch event: 1
Last Year
- Issues event: 1
- Watch event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| blind-contours | d****y@b****u | 142 |
| David Brenton McCoy | d****y@l****c | 10 |
| David Brenton McCoy | d****y@l****c | 7 |
| David McCoy | d****y@P****l | 4 |
| David Brenton McCoy | d****y@l****c | 3 |
| David McCoy | d****y@P****n | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 6
- Total pull requests: 0
- Average time to close issues: 10 days
- Average time to close pull requests: N/A
- Total issue authors: 4
- Total pull request authors: 0
- Average comments per issue: 1.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- lucasmiranda42 (3)
- jlhanson5 (1)
- hasdk (1)
- rkmccord (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
.github/workflows/draft-pdf.yml
actions
- actions/checkout v3 composite
- actions/upload-artifact v1 composite
- openjournals/openjournals-draft-action master composite
.github/workflows/r.yml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-tinytex v1 composite
DESCRIPTION
cran
- R >= 2.10 depends
- MASS * imports
- Rdpack * imports
- assertthat * imports
- cvTools * imports
- data.table * imports
- dplyr * imports
- foreach * imports
- furrr * imports
- future * imports
- ggplot2 * imports
- haldensify * imports
- magrittr * imports
- partykit * imports
- polspline * imports
- pracma * imports
- purrr * imports
- rlang * imports
- sl3 * imports
- stringr * imports
- kableExtra * suggests
- knitr * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests