https://github.com/cvxgrp/ls-spa
A package for efficient Shapley performance attribution for least-squares problems
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Repository
A package for efficient Shapley performance attribution for least-squares problems
Basic Info
Statistics
- Stars: 8
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 5
Metadata Files
README.md
Least-Squares Shapley Performance Attribution (LS-SPA)
Installation - Usage - Hello world - Example notebook - Optional arguments - Citing
Library companion to the paper Efficient Shapley Performance Attribution for Least-Squares Regression by Logan Bell, Nikhil Devanathan, and Stephen Boyd.
The results provided in the reference paper were generated using a more performant, but harder to use implementation of the same algorithm. This benchmark code and the numerical experiments from the reference paper can be found at cvxgrp/ls-spa-benchmark. We recommend caution in trying to use the benchmark code.
Installation
To install this package, execute
bash
pip install ls_spa
Import ls_spa by adding
python
from ls_spa import ls_spa
to the top of your Python file.
ls_spa has the following dependencies:
numpyscipypandas
Optional dependencies are
marimofor using the demo notebookmatplotlibfor plotting in the demo notebook
Usage
We assume that you have imported ls_spa and you have a $N\times p$
matrix of training data X_train, a $M\times p$ matrix of testing data X_test,
a $N$ vector of training labels y_train, and a $M$ vector of testing labels y_test
for positive integers $p, N, M$ with $N,M\geq p$. In this case, you can find the
Shapley attribution of the out-of-sample $R^2$ on your data by executing
python
attrs = ls_spa(X_train, X_test, y_train, y_test).attribution
attrs will be a JAX vector containing the Shapley values of your features.
The ls_spa function computes Shapley values for the given data using
the LS-SPA method described in the companion paper. It takes arguments:
X_train: Training feature matrix.X_test: Testing feature matrix.y_train: Training response vector.y_test: Testing response vector.
Hello world
We present a complete Python script that utilizes LS-SPA to compute the Shapley attribution on the data from the toy example described in the companion paper.
```python
Imports
import numpy as np from lsspa import lsspa
Data loading
Xtrain, Xtest, ytrain, ytest = [np.load("./data/toydata.npz")[key] for key in ["Xtrain","Xtest","ytrain","y_test"]]
Compute Shapley attribution with LS-SPA
results = lsspa(Xtrain, Xtest, ytrain, y_test)
Print attribution
print(results) ```
This example uses data from the data
directory of this repository.
The line print(results) prints a dashboard of information generated while
computing the Shapley attribution such as the attribution, the $R^2$ of the
model fitted with all of the features, the feature cofficients of the fitted
model, and an error estimate on the attribution (since LS-SPA is a method
of estimation).
To extract just the vector of Shapley values, use results.attribution.
For more info, see optional arguments.
Example notebook
In this demo, we walk through the process of
computing Shapley values on the data for the toy example in the
companion paper. We then use ls_spa to compute the Shapley attribution
on the same data.
Optional arguments
ls_spa takes the optional arguments:
reg: Regularization parameter (Default0).method: Permutation sampling method. Options include'random','permutohedron','argsort', and'exact'. IfNone,'argsort'is used if the number of features is greater than 10; otherwise,'exact'is used.batch_size: Number of permutations in each batch (Default2**7).num_batches: Maximum number of batches (Default2**7).tolerance: Convergence tolerance for the Shapley values (Default1e-2).seed: Seed for random number generation (Default42).return_history: Flag to determine whether to return the history of error estimates and attributions for each feature chain (DefaultFalse).
ls_spa returns a ShapleyResults object. The ShapleyResults object
has the fields:
attribution: Array of Shapley values for each feature.attribution_history: Array of Shapley values for each iteration.Noneifreturn_history=Falseinls_spacall.theta: Array of regression coefficients.overall_error: Mean absolute error of the Shapley values.error_history: Array of mean absolute errors for each iteration.Noneifreturn_history=Falseinls_spacall.attribution_errors: Array of absolute errors for each feature.r_squared: Out-of-sample R-squared statistic of the regression.
Citing
If you use this code for research, please cite the associated paper.
bibtex
@article{Bell2024,
title = {Efficient Shapley performance attribution for least-squares regression},
volume = {34},
ISSN = {1573-1375},
url = {http://dx.doi.org/10.1007/s11222-024-10459-9},
DOI = {10.1007/s11222-024-10459-9},
number = {5},
journal = {Statistics and Computing},
publisher = {Springer Science and Business Media LLC},
author = {Bell, Logan and Devanathan, Nikhil and Boyd, Stephen},
year = {2024},
month = jul
}
Owner
- Name: Stanford University Convex Optimization Group
- Login: cvxgrp
- Kind: organization
- Location: Stanford, CA
- Website: www.stanford.edu/~boyd
- Repositories: 102
- Profile: https://github.com/cvxgrp
GitHub Events
Total
- Watch event: 3
Last Year
- Watch event: 3
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: about 1 hour
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 1 hour
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- NDevanathan (10)