https://github.com/bdwilliamson/vimpy
Perform inference on algorithm-agnostic variable importance in Python
Science Score: 20.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
1 of 4 committers (25.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.5%) to scientific vocabulary
Keywords
Repository
Perform inference on algorithm-agnostic variable importance in Python
Basic Info
- Host: GitHub
- Owner: bdwilliamson
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://pypi.org/project/vimpy/
- Size: 407 KB
Statistics
- Stars: 20
- Watchers: 3
- Forks: 5
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.html
README.utf8.md Python/
vimpy: inference on algorithm-agnostic variable importanceSoftware author: Brian Williamson
Methodology authors: Brian Williamson, Peter Gilbert, Noah Simon, Marco Carone
Introduction
In predictive modeling applications, it is often of interest to determine the relative contribution of subsets of features in explaining an outcome; this is often called variable importance. It is useful to consider variable importance as a function of the unknown, underlying data-generating mechanism rather than the specific predictive algorithm used to fit the data. This package provides functions that, given fitted values from predictive algorithms, compute nonparametric estimates of variable importance based on \(R^2\), deviance, classification accuracy, and area under the receiver operating characteristic curve, along with asymptotically valid confidence intervals for the true importance.
For more details, please see the accompanying manuscripts “Nonparametric variable importance assessment using machine learning techniques” by Williamson, Gilbert, Carone, and Simon (Biometrics, 2020) and “A unified approach for inference on algorithm-agnostic variable importance” by Williamson, Gilbert, Simon, and Carone (arXiv, 2020).
Installation
You may install a stable release of
vimpyusingpipby runningpython pip install vimpyfrom a Terminal window. Alternatively, you may install within avirtualenvenvironment.You may install the current dev release of
vimpyby downloading this repository directly.Issues
If you encounter any bugs or have any specific feature requests, please file an issue.
Example
This example shows how to use
vimpyin a simple setting with simulated data and using a single regression function. For more examples and detailed explanation, please see theRvignette.## load required libraries import numpy as np import vimpy from sklearn.ensemble import GradientBoostingRegressor from sklearn.model_selection import GridSearchCV ## ------------------------------------------------------------- ## problem setup ## ------------------------------------------------------------- ## define a function for the conditional mean of Y given X def cond_mean(x = None): f1 = np.where(np.logical_and(-2 <= x[:, 0], x[:, 0] < 2), np.floor(x[:, 0]), 0) f2 = np.where(x[:, 1] <= 0, 1, 0) f3 = np.where(x[:, 2] > 0, 1, 0) f6 = np.absolute(x[:, 5]/4) ** 3 f7 = np.absolute(x[:, 6]/4) ** 5 f11 = (7./3)*np.cos(x[:, 10]/2) ret = f1 + f2 + f3 + f6 + f7 + f11 return ret ## create data np.random.seed(4747) n = 100 p = 15 s = 1 # importance desired for X_1 x = np.zeros((n, p)) for i in range(0, x.shape[1]) : x[:,i] = np.random.normal(0, 2, n) y = cond_mean(x) + np.random.normal(0, 1, n) ## ------------------------------------------------------------- ## preliminary step: get regression estimators ## ------------------------------------------------------------- ## use grid search to get optimal number of trees and learning rate ntrees = np.arange(100, 3500, 500) lr = np.arange(.01, .5, .05) param_grid = [{'n_estimators':ntrees, 'learning_rate':lr}] ## set up cv objects cv_full = GridSearchCV(GradientBoostingRegressor(loss = 'ls', max_depth = 1), param_grid = param_grid, cv = 5) cv_small = GridSearchCV(GradientBoostingRegressor(loss = 'ls', max_depth = 1), param_grid = param_grid, cv = 5) ## fit the full regression cv_full.fit(x, y) full_fit = cv_full.best_estimator_.predict(x) ## fit the reduced regression x_small = np.delete(x, s, 1) # delete the columns in s cv_small.fit(x_small, full_fit) small_fit = cv_small.best_estimator_.predict(x_small) ## ------------------------------------------------------------- ## get variable importance estimates ## ------------------------------------------------------------- ## set up the vimp object vimp = vimpy.vim(y = y, x = x, s = 1, pred_func = cv_full, measure_type = "r_squared") ## get the point estimate of variable importance vimp.get_point_est() ## get the influence function estimate vimp.get_influence_function() ## get a standard error vimp.get_se() ## get a confidence interval vimp.get_ci() ## do a hypothesis test, compute p-value vimp.hypothesis_test(alpha = 0.05, delta = 0) ## display the estimates, etc. vimp.vimp_ vimp.se_ vimp.ci_ vimp.p_value_ vimp.hyp_test_ ## ------------------------------------------------------------- ## get variable importance estimates using cross-validation ## ------------------------------------------------------------- ## set up the vimp object vimp_cv = vimp.cv_vim(y = y, x = x, s = 1, pred_func = cv_full, V = 5, measure_type = "r_squared") ## get the point estimate vimp_cv.get_point_est() ## get the standard error vimp_cv.get_influence_function() vimp_cv.get_se() ## get a confidence interval vimp_cv.get_ci() ## do a hypothesis test, compute p-value vimp_cv.hypothesis_test(alpha = 0.05, delta = 0) ## display estimates, etc. vimp_cv.vimp_ vimp_cv.se_ vimp_cv.ci_ vimp_cv.p_value_ vimp_cv.hyp_test_
Owner
- Name: Brian Williamson
- Login: bdwilliamson
- Kind: user
- Location: Seattle, Washington USA
- Company: Kaiser Permanente Washington Health Research Institute
- Website: https://bdwilliamson.github.io/
- Repositories: 46
- Profile: https://github.com/bdwilliamson
Assistant Investigator at Kaiser Permanente Washington Health Research Institute. Interested in inference in high-dimensional settings.
GitHub Events
Total
Last Year
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 74
- Total Committers: 4
- Avg Commits per committer: 18.5
- Development Distribution Score (DDS): 0.176
Top Committers
| Name | Commits | |
|---|---|---|
| Brian Williamson | b****6@u****u | 61 |
| Brian Williamson | b****n@k****g | 10 |
| Jean Feng | j****g@g****m | 2 |
| Jenny | j****t@g****m | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 2
- Average time to close issues: about 1 hour
- Average time to close pull requests: about 1 hour
- Total issue authors: 4
- Total pull request authors: 2
- Average comments per issue: 4.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Anaraquelpengelly (1)
- shaayaansayed (1)
- Tim-Re (1)
- mizano924 (1)
Pull Request Authors
- JennyLeeStat (1)
- jjfeng (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 63 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 11
- Total maintainers: 1
pypi.org: vimpy
vimpy: perform inference on algorithm-agnostic variable importance in python
- Homepage: https://github.com/bdwilliamson/vimpy
- Documentation: https://vimpy.readthedocs.io/
- License: MIT
-
Latest release: 2.0.2
published over 5 years ago
Rankings
Maintainers (1)
Dependencies
- numpy *
- scipy *