effectplots

Fast Effect Plots in R

https://github.com/mayer79/effectplots

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary

Keywords

machine-learning r regression xai

Last synced: 9 months ago · JSON representation

Repository

Fast Effect Plots in R

Basic Info

Host: GitHub
Owner: mayer79
License: gpl-3.0
Language: R
Default Branch: main
Homepage: https://mayer79.github.io/effectplots/
Size: 11 MB

Statistics

Stars: 21
Watchers: 2
Forks: 1
Open Issues: 1
Releases: 4

Topics

machine-learning r regression xai

Created over 1 year ago · Last pushed 12 months ago

Metadata Files

Readme Changelog License

effectplots

{effectplots} is an R package for calculating and plotting feature effects of any model. It is very fast thanks to {collapse}.

The main function feature_effects() crunches these statistics per feature X over values/bins:

Average observed y values: Descriptive associations between response y and features.
Average predictions: Combined effect of X and other features (M Plots, Apley [1]).
Partial dependence (Friedman [2]): How does the average prediction react on X, keeping other features fixed.
Accumulated local effects (Apley [1]): Alternative to partial dependence.

Furthermore, it calculates counts, weight sums, average residuals, and standard deviations of observed y and residuals. All statistics respect optional case weights.

We highly recommend Christoph Molnar's book [3] for more info on feature effects.

It takes 1 second on a normal laptop to get all statistics for 10 features on 10 Mio rows (+ prediction time).

Workflow

Crunch values via feature_effects() or the little helpers average_observed(), partial_dependence() etc.
Update the results with update(): Combine rare levels of categorical features, sort results by importance, turn values of discrete features to factor etc.
Plot the results with plot(): Choose between ggplot2/patchwork and plotly.

Outlier capping: Extreme outliers in numeric features are capped by default (but not deleted). To avoid capping, set outlier_iqr = Inf.

Installation

You can install the development version of {effectplots} from GitHub with:

``` r

install.packages("pak")

pak::pak("mayer79/effectplots", dependencies = TRUE) ```

Usage

We use a 1 Mio row dataset on Motor TPL insurance. The aim is to model claim frequency. Before modeling, we want to study the association between features and response.

``` r library(effectplots) library(OpenML) library(lightgbm)

set.seed(1)

df <- getOMLDataSet(data.id = 45106L)$data

xvars <- c("year", "town", "driverage", "carweight", "carpower", "carage")

0.1s on laptop

averageobserved(df[xvars], y = df$claimnb) |> update(tofactor = TRUE) |> # turn discrete numerics to factors plot(sharey = "all") ```

A shared y axis helps to compare the strength of the association across features.

Fit model

Next, let's fit a boosted trees model.

```r ix <- sample(nrow(df), 0.8 * nrow(df)) train <- df[ix, ] test <- df[-ix, ] Xtrain <- data.matrix(train[xvars]) Xtest <- data.matrix(test[xvars])

Training, using slightly optimized parameters found via cross-validation

params <- list( learningrate = 0.05, objective = "poisson", numleaves = 7, mindatainleaf = 50, minsumhessianinleaf = 0.001, colsamplebynode = 0.8, baggingfraction = 0.8, lambdal1 = 3, lambdal2 = 5, numthreads = 7 )

fit <- lgb.train( params = params, data = lgb.Dataset(Xtrain, label = train$claimnb), nrounds = 300 ) ```

Inspect model

Let's crunch all statistics on the test data. Sorting is done by weighted variance of partial dependence, a main-effect importance measure related to [4].

The average predictions closely follow the average observed, i.e., the model seems to do a good job. Comparing partial dependence/ALE with average predicted gives insights on whether an effect mainly comes from the feature on the x axis or from other, correlated, features.

```r

0.1s + 0.15s prediction time

featureeffects(fit, v = xvars, data = Xtest, y = test$claimnb) |> update(sortby = "pd") |> plot() ```

Flexibility

What about combining training and test results? Or comparing different models or subgroups? No problem:

```r mtrain <- featureeffects(fit, v = xvars, data = Xtrain, y = train$claimnb) mtest <- featureeffects(fit, v = xvars, data = Xtest, y = test$claimnb)

Pick top 3 based on train

mtrain <- mtrain |> update(sortby = "pd") |> head(3) mtest <- mtest[names(mtrain)]

Concatenate train and test results and plot them

c(mtrain, mtest) |> plot( sharey = "rows", ncol = 2, byrow = FALSE, stats = c("ymean", "predmean"), subplottitles = FALSE, # plotly = TRUE, title = "Left: Train - Right: Test", ) ```

To look closer at bias, let's select the statistic "resid_mean" along with pointwise 95% confidence intervals for the true conditional bias.

r c(m_train, m_test) |> update(drop_below_n = 50) |> plot( ylim = c(-0.07, 0.08), ncol = 2, byrow = FALSE, stats = "resid_mean", subplot_titles = FALSE, title = "Left: Train - Right: Test", # plotly = TRUE, interval = "ci" )

More examples

Most models work out-of-the box, including DALEX explainers and Tidymodels models. If not, a tailored prediction function can be specified.

DALEX

```r library(effectplots) library(DALEX) library(ranger)

set.seed(1)

fit <- ranger(Sepal.Length ~ ., data = iris) ex <- DALEX::explain(fit, data = iris[, -1], y = iris[, 1])

featureeffects(ex, breaks = 5) |> plot(sharey = "all") ```

Tidymodels

Note that ALE plots are only available for continuous variables.

```r library(effectplots) library(tidymodels)

set.seed(1)

xvars <- c("carat", "color", "clarity", "cut")

split <- initial_split(diamonds) train <- training(split) test <- testing(split)

dia_recipe <- train |> recipe(reformulate(xvars, "price"))

mod <- randforest(trees = 100) |> setengine("ranger") |> set_mode("regression")

diawf <- workflow() |> addrecipe(diarecipe) |> addmodel(mod)

fit <- dia_wf |> fit(train)

Mtrain <- featureeffects(fit, v = xvars, data = train, y = "price") Mtest <- featureeffects(fit, v = xvars, data = test, y = "price")

plot( Mtrain + Mtest, byrow = FALSE, ncol = 2, sharey = "rows", rotatex = rep(45 * xvars %in% c("clarity", "cut"), each = 2), subplot_titles = FALSE, # plotly = TRUE, title = "Left: train - Right: test" ) ```

Probabilistic classification

We focus on a single class.

```r library(effectplots) library(ranger)

set.seed(1)

fit <- ranger(Species ~ ., data = iris, probability = TRUE)

M <- partialdependence( fit, v = colnames(iris[1:4]), data = iris, whichpred = 1 # "setosa" is the first class ) plot(M, bar_height = 0.33, ylim = c(0, 0.7)) ```

References

Apley, Daniel W., and Jingyu Zhu. 2020. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82 (4): 1059–1086. doi:10.1111/rssb.12377.
Friedman, Jerome H. 2001. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics 29 (5): 1189–1232. doi:10.1214/aos/1013203451.
Molnar, Christoph. 2019. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/.
Greenwell, Brandon M., Bradley C. Boehmke, and Andrew J. McCarthy. 2018. A Simple and Effective Model-Based Variable Importance Measure. arXiv preprint. https://arxiv.org/abs/1805.04755.

Owner

Name: Michael Mayer
Login: mayer79
Kind: user

Repositories: 12
Profile: https://github.com/mayer79

Responsible statistics | ML

GitHub Events

Total

Create event: 57
Issues event: 16
Release event: 4
Watch event: 19
Delete event: 56
Issue comment event: 10
Push event: 250
Pull request event: 99
Fork event: 1

Last Year

Create event: 57
Issues event: 16
Release event: 4
Watch event: 19
Delete event: 56
Issue comment event: 10
Push event: 250
Pull request event: 99
Fork event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 10
Total pull requests: 65
Average time to close issues: 10 days
Average time to close pull requests: about 2 hours
Total issue authors: 3
Total pull request authors: 2
Average comments per issue: 1.0
Average comments per pull request: 0.08
Merged pull requests: 60
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 10
Pull requests: 65
Average time to close issues: 10 days
Average time to close pull requests: about 2 hours
Issue authors: 3
Pull request authors: 2
Average comments per issue: 1.0
Average comments per pull request: 0.08
Merged pull requests: 60
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mayer79 (10)
SebKrantz (1)

Pull Request Authors

mayer79 (97)
btupper (1)

Top Labels

Issue Labels

enhancement (4) bug (1)

Pull Request Labels

enhancement (10)

Packages

Total packages: 1
Total downloads:
- cran 647 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

cran.r-project.org: effectplots

Effect Plots

Homepage: https://github.com/mayer79/effectplots
Documentation: http://cran.r-project.org/web/packages/effectplots/effectplots.pdf
License: GPL (≥ 3)
Latest release: 0.2.2
published about 1 year ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 647 Last month

Rankings

Dependent packages count: 27.8%

Dependent repos count: 34.2%

Average: 49.7%

Downloads: 87.0%

Maintainers (1)

mayermichael79@gmail.com

Last synced: 10 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions

actions/checkout v4 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/pkgdown.yaml actions

JamesIves/github-pages-deploy-action v4.5.0 composite
actions/checkout v4 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

DESCRIPTION cran

R >= 4.1.0 depends
ggplot2 * imports
patchwork * imports
plotly * imports
stats * imports
testthat >= 3.0.0 suggests

effectplots

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

effectplots

Installation

install.packages("pak")

Usage

0.1s on laptop

Fit model

Training, using slightly optimized parameters found via cross-validation

Inspect model

0.1s + 0.15s prediction time

Flexibility

Pick top 3 based on train

Concatenate train and test results and plot them

More examples

DALEX

Tidymodels

Probabilistic classification

References

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: effectplots

Rankings

Maintainers (1)

Dependencies