https://github.com/bdwilliamson/lvimp

Perform Inference on Summaries of Longidutinal Algorithm-Agnostic Variable Importance

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Perform Inference on Summaries of Longidutinal Algorithm-Agnostic Variable Importance

Basic Info

Host: GitHub
Owner: bdwilliamson
License: other
Language: R
Default Branch: main
Size: 53.7 KB

Statistics

Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License

`R/lvimp`: inference on longitudinal summaries of algorithm-agnostic variable importance

Software author: Brian Williamson

Methodology authors: Brian Williamson, Erica Moodie, and Susan Shortreed

Introduction

In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures (specifically, the vimp package), we define summaries of variable importance trajectories. These measures can be estimated and the same approaches for inference can be applied regardless of the choice of the algorithm(s) used to estimate the prediction function. This package provides functions that, given fitted values from prediction algorithms, compute algorithm-agnostic estimates that summarize population variable importance over time.

More detail may be found in our paper.

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

R installation

You may install a development release of lvimp from GitHub via pak by running the following code:

r pak::pkg_install(repo = "bdwilliamson/lvimp")

Example

This example shows how to use lvimp in a simple setting with simulated data.

```r

load required functions and packages

library("vimp") library("SuperLearner")

generate some data from a simple setting -------------------------------------

set.seed(4747) p <- 2 n <- 5e4 T <- 3 timepoints <- seqlen(T) - 1 beta01 <- rep(1, T) beta02 <- 1 + timepoints / 4 beta0 <- lapply(as.list(seqlen(T)), function(t) { matrix(c(beta01[t], beta_02[t])) })

generate 2 covariates

x <- lapply(as.list(1:T), function(t) as.data.frame(replicate(p, stats::rnorm(n, 0, 1))))

apply the function to the x's

y <- lapply(as.list(1:T), function(t) as.matrix(x[[t]]) %*% beta_0[[t]] + rnorm(n, 0, 1))

"true" outcome variance

true_var <- unlist(lapply(as.list(1:T), function(t) mean((y[[t]] - mean(y[[t]])) ^ 2)))

note that true difference in R-squareds for variable j, under independence, is

betaj^2 * var(xj) / var(y)

mseone <- unlist(lapply(as.list(1:T), function(t) mean((y[[t]] - beta01[t] * x[[t]][, 1]) ^ 2))) msetwo <- unlist(lapply(as.list(1:T), function(t) mean((y[[t]] - beta02[t] * x[[t]][, 2]) ^ 2))) msefull <- unlist(lapply(as.list(1:T), function(t) mean((y[[t]] - as.matrix(x[[t]]) %*% beta0[[t]]) ^ 2))) r2one <- 1 - mseone / truevar r2two <- 1 - msetwo / truevar r2full <- 1 - msefull / true_var

estimate predictiveness, variable importance at each timepoint ---------------

set.seed(1234)

in this case, glm is correctly specified (so only use one learner to speed things up)

vimlist1 <- lapply(as.list(1:T), function(t) { vimp::cvvim(Y = y[[t]], X = x[[t]], indx = 1, V = 10, type = "rsquared", SL.library = c("SL.glm")) }) set.seed(5678) vimlist2 <- lapply(as.list(1:T), function(t) { vimp::cvvim(Y = y[[t]], X = x[[t]], indx = 2, V = 10, type = "rsquared", SL.library = c("SL.glm")) })

obtain the average, linear trend, and AUTC for the time series ---------------

lvimobj <- lvim(vimlist1, timepoints = 1:3) estaverage <- lvimaverage(lvimobj, indices = 1:3) esttrend <- lvimtrend(lvimobj, indices = 1:3) estautc <- lvimautc(lvimobj, indices = 1:3) ```

Owner

Name: Brian Williamson
Login: bdwilliamson
Kind: user
Location: Seattle, Washington USA
Company: Kaiser Permanente Washington Health Research Institute

Website: https://bdwilliamson.github.io/
Repositories: 46
Profile: https://github.com/bdwilliamson

Assistant Investigator at Kaiser Permanente Washington Health Research Institute. Interested in inference in high-dimensional settings.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bdwilliamson/lvimp

Science Score: 10.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

`R/lvimp`: inference on longitudinal summaries of algorithm-agnostic variable importance

Introduction

Issues

R installation

Example

load required functions and packages

generate some data from a simple setting -------------------------------------

generate 2 covariates

apply the function to the x's

"true" outcome variance

note that true difference in R-squareds for variable j, under independence, is

betaj^2 * var(xj) / var(y)

estimate predictiveness, variable importance at each timepoint ---------------

in this case, glm is correctly specified (so only use one learner to speed things up)

obtain the average, linear trend, and AUTC for the time series ---------------

Owner

GitHub Events

Total

Last Year

https://github.com/bdwilliamson/lvimp

Science Score: 10.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

R/lvimp: inference on longitudinal summaries of algorithm-agnostic variable importance

Introduction

Issues

R installation

Example

load required functions and packages

generate some data from a simple setting -------------------------------------

generate 2 covariates

apply the function to the x's

"true" outcome variance

note that true difference in R-squareds for variable j, under independence, is

betaj^2 * var(xj) / var(y)

estimate predictiveness, variable importance at each timepoint ---------------

in this case, glm is correctly specified (so only use one learner to speed things up)

obtain the average, linear trend, and AUTC for the time series ---------------

Owner

GitHub Events

Total

Last Year

`R/lvimp`: inference on longitudinal summaries of algorithm-agnostic variable importance