jointVIP

jointVIP: Prioritizing variables in observational study design with joint variable importance plot in R - Published in JOSS (2024)

https://github.com/ldliao/jointvip

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

causal-inference observational-study r rstats study-design
Last synced: 6 months ago · JSON representation ·

Repository

Prioritize variables in observational study design through the joint variable importance plot; shiny app: https://ldliao.shinyapps.io/jointVIP/

Basic Info
Statistics
  • Stars: 7
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
causal-inference observational-study r rstats study-design
Created over 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog Contributing License Citation

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "80%"
)

devtools::load_all(".")
```

# Joint variable importance plot 


[![CRAN_Status_Badge](https://img.shields.io/cran/v/jointVIP?color=952100)](https://cran.r-project.org/package=jointVIP) [![CRAN_Downloads_Badge](https://cranlogs.r-pkg.org/badges/jointVIP?color=952100)](https://cran.r-project.org/package=jointVIP)
[![R-CMD-check](https://github.com/ldliao/jointVIP/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ldliao/jointVIP/actions/workflows/R-CMD-check.yaml)


Joint variable importance plot (jointVIP) visualizes each variable's outcome importance via Pearson's correlation and treatment importance via cross-sample standardized mean differences. Bias curves enable comparisons to support variable prioritization among potential confounders.

## Installation

You can install the `jointVIP` package on CRAN using:

``` r
# for version on CRAN
install.packages("jointVIP")

# for development version on github
devtools::install_github("ldliao/jointVIP")
```

## BRFSS Example

To demonstrate, we use the 2015 Behavioral Risk Factor Surveillance System (BRFSS) example to answer the causal question: Does smoking increase the risk of chronic obstructive pulmonary disease (COPD)? The data and background is inspired by [Clay Ford's work from University of Virginia Library](https://data.library.virginia.edu/getting-started-with-matching-methods/). First, the data is cleaned to only have numeric variables, i.e., all factored variables are transformed via one-hot-encoding. Treatment variable `smoke` only contains 0 (control) and 1 (treatment).

With the cleaned data, you can specify details in the function `create_jointVIP()` like so:

```{r example}
library(jointVIP)
## basic example code
library(dplyr)

# load data
data('brfss', package='jointVIP')

treatment = 'smoke'
outcome = 'COPD'
covariates = names(brfss)[!names(brfss) %in% c(treatment, outcome)]

## select the pilot sample from random portion
## pilot data here are considered as 'external controls'
## can be a separate dataset; should be chosen with caution
set.seed(1234895)
pilot_prop = 0.2 
pilot_sample_num = sample(which(brfss %>% pull(treatment) == 0),
                          length(which(brfss %>% pull(treatment) == 0)) *
                          pilot_prop)

## set up pilot and analysis data
## we want to make sure these two data are non-overlapping

pilot_df = brfss[pilot_sample_num, ]
analysis_df = brfss[-pilot_sample_num, ]

## minimal example
brfss_jointVIP = create_jointVIP(treatment = treatment,
                                 outcome = outcome,
                                 covariates = covariates,
                                 pilot_df = pilot_df,
                                 analysis_df = analysis_df)
```

Generic functions can be used for the `jointVIP` object to extract information as a glance with `summary()` and `print()`. 

```{r generic}
summary(brfss_jointVIP)
print(brfss_jointVIP)
```


```{r plot, dpi=300, fig.asp = 0.75, fig.width = 6, fig.align = "center", message=FALSE}
plot(brfss_jointVIP)
```

In this example, `age_over65` and `average_drinks` are two most important variables to adjust. At a bias tolerance of 0.01, 3 variables: `age_over65`, `average_drinks`, and `age_25to34` are above the tolerance threshold. Moreover, `age_over65` and `average_drinks` are of higher importance for adjustment than `age_25to34`. Although `race_black` and `age_over65` have similar absolute standardized mean differences (0.322 and 0.333, respectively), `age_over65` is more important to adjust for since its highly correlated with the outcome.

## Acknowledgement
Ford, C. 2018. “Getting Started with Matching Methods.” UVA Library StatLab. https://library.virginia.edu/data/articles/getting-started-with-matching-methods/ (accessed Jan 29, 2024).

Owner

  • Name: Lauren Liao
  • Login: ldliao
  • Kind: user
  • Location: Berkeley, CA

Aspiring Data Scientist and Biostatistician. Doctoral Biostatistics student at UC Berkeley

JOSS Publication

jointVIP: Prioritizing variables in observational study design with joint variable importance plot in R
Published
November 12, 2024
Volume 9, Issue 103, Page 6093
Authors
Lauren D. Liao ORCID
Division of Biostatistics, University of California, Berkeley, USA
Samuel D. Pimentel ORCID
Department of Statistics, University of California, Berkeley, USA
Editor
Andrew Stewart ORCID
Tags
observational study study design visualization causal inference

Citation (citation.cff)

cff-version: "1.2.0"
authors:
- family-names: Liao
  given-names: Lauren D.
  orcid: "https://orcid.org/0000-0003-4697-6909"
- family-names: Pimentel
  given-names: Samuel D.
  orcid: "https://orcid.org/0000-0002-0409-6586"
contact:
- family-names: Liao
  given-names: Lauren D.
  orcid: "https://orcid.org/0000-0003-4697-6909"
doi: 10.5281/zenodo.14020544
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Liao
    given-names: Lauren D.
    orcid: "https://orcid.org/0000-0003-4697-6909"
  - family-names: Pimentel
    given-names: Samuel D.
    orcid: "https://orcid.org/0000-0002-0409-6586"
  date-published: 2024-11-12
  doi: 10.21105/joss.06093
  issn: 2475-9066
  issue: 103
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6093
  title: "jointVIP: Prioritizing variables in observational study design
    with joint variable importance plot in R"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06093"
  volume: 9
title: "jointVIP: Prioritizing variables in observational study design
  with joint variable importance plot in R"

GitHub Events

Total
  • Release event: 1
  • Watch event: 1
  • Push event: 24
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 1
  • Push event: 24
  • Create event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 110
  • Total Committers: 2
  • Avg Commits per committer: 55.0
  • Development Distribution Score (DDS): 0.055
Past Year
  • Commits: 16
  • Committers: 1
  • Avg Commits per committer: 16.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
ldliao l****9@g****m 104
Lauren Liao l****o@B****l 6

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 4
  • Average time to close issues: 5 months
  • Average time to close pull requests: 2 minutes
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 5.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jackmwolf (1)
  • nhejazi (1)
Pull Request Authors
  • ldliao (6)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 244 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
  • Total maintainers: 1
cran.r-project.org: jointVIP

Prioritize Variables with Joint Variable Importance Plot in Observational Study Design

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 244 Last month
Rankings
Stargazers count: 28.5%
Forks count: 28.8%
Dependent packages count: 29.8%
Average: 33.5%
Dependent repos count: 35.5%
Downloads: 44.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • dplyr * imports
  • ggplot2 * imports
  • ggpubr * imports
  • ggrepel * imports
  • testthat * imports
  • tidyr * imports
jointVIP.Rcheck/00_pkg_src/jointVIP/DESCRIPTION cran
  • R >= 3.3 depends
  • ggplot2 >= 3.4.0 imports
  • ggrepel >= 0.9.2 imports
  • MatchIt * suggests
  • WeightIt * suggests
  • causaldata * suggests
  • devtools >= 2.4.5 suggests
  • knitr * suggests
  • optmatch * suggests
  • optweight >= 0.2.4 suggests
  • rmarkdown >= 2.18 suggests
  • testthat >= 3.0.0 suggests
jointVIP.Rcheck/jointVIP/DESCRIPTION cran
  • R >= 3.3 depends
  • ggplot2 >= 3.4.0 imports
  • ggrepel >= 0.9.2 imports
  • MatchIt * suggests
  • WeightIt * suggests
  • causaldata * suggests
  • devtools >= 2.4.5 suggests
  • knitr * suggests
  • optmatch * suggests
  • optweight >= 0.2.4 suggests
  • rmarkdown >= 2.18 suggests
  • testthat >= 3.0.0 suggests