shapr

shapr: An R-package for explaining machine learning models with dependence-aware Shapley values - Published in JOSS (2019)

https://github.com/norskregnesentral/shapr

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

explainable-ai explainable-ml rcpp rcpparmadillo rstats shapley

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 69% confidence

Earth and Environmental Sciences Physical Sciences - 40% confidence

Economics Social Sciences - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

Explaining the output of machine learning models with more accurately estimated Shapley values

Basic Info

Host: GitHub
Owner: NorskRegnesentral
License: other
Language: HTML
Default Branch: master
Homepage: https://norskregnesentral.github.io/shapr/
Size: 120 MB

Statistics

Stars: 164
Watchers: 7
Forks: 36
Open Issues: 7
Releases: 14

Topics

explainable-ai explainable-ml rcpp rcpparmadillo rstats shapley

Created almost 8 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog Contributing License Code of conduct

README.Rmd

---
output: github_document
bibliography: ./inst/REFERENCES.bib
link-citations: yes
---



```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  tidy = "styler"
)
```

# shapr 


[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version-last-release/shapr)](https://cran.r-project.org/package=shapr)
[![CRAN_Downloads_Badge](https://cranlogs.r-pkg.org/badges/grand-total/shapr)](https://cran.r-project.org/package=shapr)
[![R build status](https://github.com/NorskRegnesentral/shapr/workflows/R-CMD-check/badge.svg)](https://github.com/NorskRegnesentral/shapr/actions?query=workflow%3AR-CMD-check)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit)
[![JOSS (v0.2.3)](https://img.shields.io/badge/JOSS (v0.2.3)-10.21105/joss.02027-brightgreen.svg)](https://doi.org/10.21105/joss.02027)
[![arXiv (v1.0.4)](https://img.shields.io/badge/arXiv (v1.0.4)-2504.01842-b31b1b.svg)](https://arxiv.org/abs/2504.01842)


See the pkgdown site at [norskregnesentral.github.io/shapr/](https://norskregnesentral.github.io/shapr/)
for a complete introduction with examples and documentation of the package.

For an overview of the methodology and capabilities of the package (per `shapr` v1.0.4),
see the software paper @jullum2025shapr, available as a preprint [here](https://arxiv.org/abs/2504.01842).




## NEWS

With `shapr` version 1.0.0 (GitHub only, Nov 2024) and version 1.0.1 (CRAN, Jan 2025),
the package underwent a major update, providing a full restructuring of the code base, and
a full suite of new functionality, including:

* A long list of approaches for estimating the contribution/value function $v(S)$, including Variational Autoencoders
  and regression-based methods
* Iterative Shapley value estimation with convergence detection
* Parallelized computations with progress updates
* Reweighted Kernel SHAP for faster convergence
* New function `explain_forecast()` for explaining forecasts
* Asymmetric and causal Shapley values
* Several other methodological, computational and user-experience improvements
* Python wrapper `shaprpy` making the core functionality of `shapr` available in Python


See the [NEWS](https://norskregnesentral.github.io/shapr/news/index.html) for a complete list.

### Coming from shapr < 1.0.0?
`shapr` version >= 1.0.0 comes with a number of breaking changes.
Most notably, we moved from using two functions (`shapr()` and `explain()`) to
one function (`explain()`).
In addition, custom models are now explained by passing the prediction function directly to `explain()`.
Several input arguments were renamed, and a few functions for edge cases were removed to simplify the code base.

Click [here](https://github.com/NorskRegnesentral/shapr/blob/cranversion_0.2.2/README.md) to view a version of this README with the old syntax (v0.2.2).

### Python wrapper

We provide a Python wrapper (`shaprpy`) which allows explaining Python models with the methodology
implemented in `shapr`, directly from Python.
The wrapper calls R internally and therefore requires an installation of R.
See [here](https://norskregnesentral.github.io/shapr/shaprpy.html) for installation instructions and examples.


## The package

The `shapr` R package implements an enhanced version of the Kernel SHAP method for approximating Shapley values,
with a strong focus on conditional Shapley values.
The core idea is to remain completely model-agnostic while offering a variety of methods for estimating contribution 
functions, enabling accurate computation of conditional Shapley values across different feature types, dependencies, 
and distributions. 
The package also includes evaluation metrics to compare various approaches. 
With features like parallelized computations, convergence detection, progress updates, and extensive plotting options,
shapr is a highly efficient and user-friendly tool, delivering precise estimates of conditional Shapley values,
which are critical for understanding how features truly contribute to predictions.

A basic example is provided below.
Otherwise, we refer to the [pkgdown website](https://norskregnesentral.github.io/shapr/) and the vignettes there for details and further examples.


## Installation

`shapr` is available on [CRAN](https://cran.r-project.org/package=shapr) and can be installed in R as:

```{r, eval = FALSE}
install.packages("shapr")
```

To install the development version of `shapr`, available on GitHub, use

```{r, eval = FALSE}
remotes::install_github("NorskRegnesentral/shapr")
```

To also install all dependencies, use

```{r, eval = FALSE}
remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)
```


## Example
`shapr` supports computation of Shapley values with any predictive model that takes a set of numeric features and produces a numeric outcome.

The following example shows how a simple `xgboost` model is trained using the *airquality* dataset, and how `shapr` explains the individual predictions. 

We first enable parallel computation and progress updates with the following code chunk.
These are optional, but recommended for improved performance and user-friendliness,
particularly for problems with many features.

```{r init_no_eval,eval = FALSE}
# Enable parallel computation
# Requires the future and future_lapply packages
future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features

# Enable progress updates of the v(S) computations
# Requires the progressr package
progressr::handlers(global = TRUE)
progressr::handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)
```

Here is the actual example:
```{r basic_example, warning = FALSE}
library(xgboost)
library(shapr)

data("airquality")
data <- data.table::as.data.table(airquality)
data <- data[complete.cases(data), ]

x_var <- c("Solar.R", "Wind", "Temp", "Month")
y_var <- "Ozone"

ind_x_explain <- 1:6
x_train <- data[-ind_x_explain, ..x_var]
y_train <- data[-ind_x_explain, get(y_var)]
x_explain <- data[ind_x_explain, ..x_var]

# Look at the dependence between the features
cor(x_train)

# Fit a basic xgboost model to the training data
model <- xgboost(
  data = as.matrix(x_train),
  label = y_train,
  nround = 20,
  verbose = FALSE
)

# Specify phi_0, i.e., the expected prediction without any features
p0 <- mean(y_train)

# Compute Shapley values with Kernel SHAP, accounting for feature dependence using
# the empirical (conditional) distribution approach with bandwidth parameter sigma = 0.1 (default)
explanation <- explain(
  model = model,
  x_explain = x_explain,
  x_train = x_train,
  approach = "empirical",
  phi0 = p0,
  seed = 1
)

# Print the Shapley values for the observations to explain.
print(explanation)

# Provide a formatted summary of the shapr object
summary(explanation)

# Finally, we plot the resulting explanations
plot(explanation)
```

See @jullum2025shapr (preprint available [here](https://arxiv.org/abs/2504.01842)) for a software paper with an overview of the methodology and capabilities of the
package (as of v1.0.4).
See the [general usage vignette](https://norskregnesentral.github.io/shapr/articles/general_usage.html) for further 
basic usage examples and brief introductions to the methodology. 
For more thorough information about the underlying methodology, see methodological papers
@aas2019explaining, @redelmeier2020explaining, @jullum2021efficient, @olsen2022using, @olsen2024comparative.
See also @sellereite2019shapr for a very brief paper about a previous version (v0.1.1) of the package
(with a different structure, syntax, and significantly less functionality).

## Contribution

All feedback and suggestions are very welcome. Details on how to contribute can be found 
[here](https://norskregnesentral.github.io/shapr/CONTRIBUTING.html). If you have any questions or comments, feel
free to open an issue [here](https://github.com/NorskRegnesentral/shapr/issues). 

Please note that the `shapr` project is released with a
[Contributor Code of Conduct](https://norskregnesentral.github.io/shapr/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms. 

## References

Owner

Name: Norsk Regnesentral (Norwegian Computing Center)
Login: NorskRegnesentral
Kind: organization
Location: Oslo, Norway

Website: https://www.nr.no/
Repositories: 15
Profile: https://github.com/NorskRegnesentral

Norwegian Computing Center is a private foundation performing research in statistical modeling, machine learning and information/communication technology

JOSS Publication

shapr: An R-package for explaining machine learning models with dependence-aware Shapley values

Published

February 05, 2020

DOI

10.21105/joss.02027

Volume 5, Issue 46, Page 2027

Authors

Nikolai Sellereite

Norwegian Computing Center

Martin Jullum

Norwegian Computing Center

Editor

Yuan Tang

GitHub Events

Total

Create event: 52
Release event: 4
Issues event: 35
Watch event: 20
Delete event: 31
Member event: 6
Issue comment event: 76
Push event: 278
Pull request review event: 76
Pull request review comment event: 82
Pull request event: 80
Fork event: 5

Last Year

Create event: 52
Release event: 4
Issues event: 35
Watch event: 20
Delete event: 31
Member event: 6
Issue comment event: 76
Push event: 278
Pull request review event: 76
Pull request review comment event: 82
Pull request event: 80
Fork event: 5

Committers

Last synced: 7 months ago

All Time

Total Commits: 277
Total Committers: 11
Avg Commits per committer: 25.182
Development Distribution Score (DDS): 0.549

Past Year

Commits: 38
Committers: 4
Avg Commits per committer: 9.5
Development Distribution Score (DDS): 0.237

Top Committers

Name	Email	Commits
Martin Jullum	j**m@n**o	125
Nikolai Sellereite	n**e@h**m	105
Lars H. B. Olsen	9****O	22
Anders Loland	a**d@n**o	12
Camilla Lingjærde	3****g	3
Jens Christian Wahl	j**l@n**o	3
jonlachmann	j**n@l**u	2
Jens Christian Wahl	j**l@g**m	2
Øystein Sørensen	o**n@h**m	1
Rawan Mahdi	1****i	1
Annabelle Redelmeier	a**r@g**m	1

Committer Domains (Top 20 + Academic)

nr.no: 3 lachmann.nu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 76
Total pull requests: 182
Average time to close issues: about 1 year
Average time to close pull requests: about 1 month
Total issue authors: 37
Total pull request authors: 12
Average comments per issue: 2.8
Average comments per pull request: 0.79
Merged pull requests: 152
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 13
Pull requests: 74
Average time to close issues: about 1 month
Average time to close pull requests: 4 days
Issue authors: 10
Pull request authors: 5
Average comments per issue: 2.69
Average comments per pull request: 0.39
Merged pull requests: 64
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

martinju (18)
AbdollahiAz (6)
aliamini-uq (4)
JensWahl (3)
jonlachmann (3)
mlds2020 (3)
LHBO (2)
fredriklaa (2)
hanneleer (2)
samkodes (2)
nikolase90 (2)
grant-roy (2)
niklasfries (2)
rawanmahdi (2)
ajoules (1)

Pull Request Authors

martinju (137)
LHBO (46)
JensWahl (12)
jonlachmann (12)
aredelmeier (6)
ungvilde (3)
igbucur (2)
andersloland (2)
MichaelChirico (2)
osorensen (1)
julienbj (1)
rawanmahdi (1)

Top Labels

Issue Labels

bug (4) postpone (3)

Pull Request Labels

bug (1)

Packages

Total packages: 2
Total downloads:
- cran 1,611 last-month
- pypi 118 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 2
(may contain duplicates)
Total versions: 12
Total maintainers: 2

cran.r-project.org: shapr

Prediction Explanation with Dependence-Aware Shapley Values

Homepage: https://norskregnesentral.github.io/shapr/
Documentation: http://cran.r-project.org/web/packages/shapr/shapr.pdf
License: MIT + file LICENSE
Latest release: 1.0.5
published 6 months ago

Versions: 11
Dependent Packages: 0
Dependent Repositories: 2
Downloads: 1,611 Last month

Rankings

Forks count: 2.6%

Stargazers count: 3.2%

Average: 13.4%

Downloads: 13.4%

Dependent repos count: 19.1%

Dependent packages count: 28.6%

Maintainers (1)

Martin.Jullum@nr.no

Last synced: 6 months ago

pypi.org: shaprpy

Python wrapper for the R package shapr (via rpy2)

Homepage: https://github.com/NorskRegnesentral/shapr
Documentation: https://norskregnesentral.github.io/shapr/shaprpy.html
Latest release: 0.3.0
published 6 months ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 118 Last month

Rankings

Dependent packages count: 8.6%

Average: 28.6%

Dependent repos count: 48.7%

Maintainers (1)

jullum

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
Matrix * imports
Rcpp >= 0.12.15 imports
condMVNorm * imports
data.table * imports
mvnfast * imports
stats * imports
MASS * suggests
caret * suggests
gbm * suggests
ggplot2 * suggests
knitr * suggests
mgcv * suggests
party * suggests
partykit * suggests
ranger * suggests
rmarkdown * suggests
roxygen2 * suggests
testthat * suggests
xgboost * suggests

.github/workflows/R-CMD-check.yaml actions

actions/checkout v2 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/lint-changed-files.yaml actions

actions/checkout v3 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/lint.yaml actions

actions/checkout v3 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/pkgdown.yaml actions

JamesIves/github-pages-deploy-action v4.4.1 composite
actions/checkout v3 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/pr-commands.yaml actions

actions/checkout v3 composite
r-lib/actions/pr-fetch v2 composite
r-lib/actions/pr-push v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/remove-old-artifacts.yml actions

c-hive/gha-remove-artifacts v1 composite

python/setup.py pypi

numpy >=1.22.3
pandas >=1.4.2
rpy2 >=3.5.1
scikit-learn >=1.0.0

shapr

Science Score: 93.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

JOSS Publication

shapr: An R-package for explaining machine learning models with dependence-aware Shapley values

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: shapr

Rankings

Maintainers (1)

pypi.org: shaprpy

Rankings

Maintainers (1)

Dependencies