https://github.com/bdwilliamson/lvimp

Perform Inference on Summaries of Longidutinal Algorithm-Agnostic Variable Importance

https://github.com/bdwilliamson/lvimp

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Perform Inference on Summaries of Longidutinal Algorithm-Agnostic Variable Importance

Basic Info
  • Host: GitHub
  • Owner: bdwilliamson
  • License: other
  • Language: R
  • Default Branch: main
  • Size: 53.7 KB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

R/lvimp: inference on longitudinal summaries of algorithm-agnostic variable importance

Software author: Brian Williamson

Methodology authors: Brian Williamson, Erica Moodie, and Susan Shortreed

Introduction

In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures (specifically, the vimp package), we define summaries of variable importance trajectories. These measures can be estimated and the same approaches for inference can be applied regardless of the choice of the algorithm(s) used to estimate the prediction function. This package provides functions that, given fitted values from prediction algorithms, compute algorithm-agnostic estimates that summarize population variable importance over time.

More detail may be found in our paper.

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

R installation

You may install a development release of lvimp from GitHub via pak by running the following code:

r pak::pkg_install(repo = "bdwilliamson/lvimp")

Example

This example shows how to use lvimp in a simple setting with simulated data.

```r

load required functions and packages

library("vimp") library("SuperLearner")

generate some data from a simple setting -------------------------------------

set.seed(4747) p <- 2 n <- 5e4 T <- 3 timepoints <- seqlen(T) - 1 beta01 <- rep(1, T) beta02 <- 1 + timepoints / 4 beta0 <- lapply(as.list(seqlen(T)), function(t) { matrix(c(beta01[t], beta_02[t])) })

generate 2 covariates

x <- lapply(as.list(1:T), function(t) as.data.frame(replicate(p, stats::rnorm(n, 0, 1))))

apply the function to the x's

y <- lapply(as.list(1:T), function(t) as.matrix(x[[t]]) %*% beta_0[[t]] + rnorm(n, 0, 1))

"true" outcome variance

true_var <- unlist(lapply(as.list(1:T), function(t) mean((y[[t]] - mean(y[[t]])) ^ 2)))

note that true difference in R-squareds for variable j, under independence, is

betaj^2 * var(xj) / var(y)

mseone <- unlist(lapply(as.list(1:T), function(t) mean((y[[t]] - beta01[t] * x[[t]][, 1]) ^ 2))) msetwo <- unlist(lapply(as.list(1:T), function(t) mean((y[[t]] - beta02[t] * x[[t]][, 2]) ^ 2))) msefull <- unlist(lapply(as.list(1:T), function(t) mean((y[[t]] - as.matrix(x[[t]]) %*% beta0[[t]]) ^ 2))) r2one <- 1 - mseone / truevar r2two <- 1 - msetwo / truevar r2full <- 1 - msefull / true_var

estimate predictiveness, variable importance at each timepoint ---------------

set.seed(1234)

in this case, glm is correctly specified (so only use one learner to speed things up)

vimlist1 <- lapply(as.list(1:T), function(t) { vimp::cvvim(Y = y[[t]], X = x[[t]], indx = 1, V = 10, type = "rsquared", SL.library = c("SL.glm")) }) set.seed(5678) vimlist2 <- lapply(as.list(1:T), function(t) { vimp::cvvim(Y = y[[t]], X = x[[t]], indx = 2, V = 10, type = "rsquared", SL.library = c("SL.glm")) })

obtain the average, linear trend, and AUTC for the time series ---------------

lvimobj <- lvim(vimlist1, timepoints = 1:3) estaverage <- lvimaverage(lvimobj, indices = 1:3) esttrend <- lvimtrend(lvimobj, indices = 1:3) estautc <- lvimautc(lvimobj, indices = 1:3) ```

Owner

  • Name: Brian Williamson
  • Login: bdwilliamson
  • Kind: user
  • Location: Seattle, Washington USA
  • Company: Kaiser Permanente Washington Health Research Institute

Assistant Investigator at Kaiser Permanente Washington Health Research Institute. Interested in inference in high-dimensional settings.

GitHub Events

Total
Last Year