stabs

Stability Selection with Error Control

https://github.com/hofnerb/stabs

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary

Keywords

cran machine-learning r-language r-package resampling stability-selection variable-importance variable-selection
Last synced: 6 months ago · JSON representation

Repository

Stability Selection with Error Control

Basic Info
Statistics
  • Stars: 26
  • Watchers: 4
  • Forks: 9
  • Open Issues: 11
  • Releases: 0
Topics
cran machine-learning r-language r-package resampling stability-selection variable-importance variable-selection
Created about 11 years ago · Last pushed about 5 years ago
Metadata Files
Readme Changelog

README.md

stabs

Build Status Build status Coverage Status CRAN Status Badge

stabs implements resampling procedures to assess the stability of selected variables with additional finite sample error control for high-dimensional variable selection procedures such as Lasso or boosting. Both, standard stability selection (Meinshausen & Bühlmann, 2010, doi:10.1111/j.1467-9868.2010.00740.x) and complementarty pairs stability selection with improved error bounds (Shah & Samworth, 2013, doi:10.1111/j.1467-9868.2011.01034.x) are implemented. The package can be combined with arbitrary user specified variable selection approaches.

For an expanded and executable version of this file please see r vignette("Using_stabs", package = "stabs")

Installation

  • Current version (from CRAN):

r install.packages("stabs")

  • Latest development version from GitHub:

r library("devtools") install_github("hofnerb/stabs")

To be able to use the install_github() command, one needs to install devtools first:

r install.packages("devtools")

Using stabs

A simple example of how to use stabs with package lars:

```r library("stabs") library("lars")

make data set available

data("bodyfat", package = "TH.data")

set seed

set.seed(1234)

lasso

(stab.lasso <- stabsel(x = bodyfat[, -2], y = bodyfat[,2], fitfun = lars.lasso, cutoff = 0.75, PFER = 1))

stepwise selection

(stab.stepwise <- stabsel(x = bodyfat[, -2], y = bodyfat[,2], fitfun = lars.stepwise, cutoff = 0.75, PFER = 1))

plot results

par(mfrow = c(2, 1)) plot(stab.lasso, main = "Lasso") plot(stab.stepwise, main = "Stepwise Selection") ```

We can see that stepwise selection seems to be quite unstable even in this low dimensional example!

User-specified variable selection approaches

To use stabs with user specified functions, one can specify an own fitfun. These need to take arguments x (the predictors), y (the outcome) and q the number of selected variables as defined for stability selection. Additional arguments to the variable selection method can be handled by .... In the function stabsel() these can then be specified as a named list which is given to args.fitfun.

The fitfun function then needs to return a named list with two elements selected and path: * selected is a vector that indicates which variable was selected. * path is a matrix that indicates which variable was selected in which step. Each row represents one variable, the columns represent the steps. The latter is optional and only needed to draw the complete selection paths.

The following example shows how lars.lasso is implemented: ```r lars.lasso <- function(x, y, q, ...) { if (!requireNamespace("lars")) stop("Package ", sQuote("lars"), " needed but not available")

if (is.data.frame(x)) {
    message("Note: ", sQuote("x"),
            " is coerced to a model matrix without intercept")
    x <- model.matrix(~ . - 1, x)
}

## fit model
fit <- lars::lars(x, y, max.steps = q, ...)

## which coefficients are non-zero?
selected <- unlist(fit$actions)
## check if variables are removed again from the active set
## and remove these from selected
if (any(selected < 0)) {
    idx <- which(selected < 0)
    idx <- c(idx, which(selected %in% abs(selected[idx])))
    selected <- selected[-idx]
}

ret <- logical(ncol(x))
ret[selected] <- TRUE
names(ret) <- colnames(x)
## compute selection paths
cf <- fit$beta
sequence <- t(cf != 0)
## return both
return(list(selected = ret, path = sequence))

} ```

To see more examples simply print, e.g., lars.stepwise, glmnet.lasso, or glmnet.lasso_maxCoef. Please contact me if you need help to integrate your method of choice.

Using boosting with stability selection

Instead of specifying a fitting function, one can also use stabsel directly on computed boosting models from mboost.

```r library("stabs") library("mboost")

low-dimensional example

mod <- glmboost(DEXfat ~ ., data = bodyfat)

compute cutoff ahead of running stabsel to see if it is a sensible

parameter choice.

p = ncol(bodyfat) - 1 (= Outcome) + 1 ( = Intercept)

stabsel_parameters(q = 3, PFER = 1, p = ncol(bodyfat) - 1 + 1, sampling.type = "MB")

the same:

stabsel(mod, q = 3, PFER = 1, sampling.type = "MB", eval = FALSE)

now run stability selection

(sbody <- stabsel(mod, q = 3, PFER = 1, sampling.type = "MB")) opar <- par(mai = par("mai") * c(1, 1, 1, 2.7)) plot(sbody, type = "paths") par(opar)

plot(sbody, type = "maxsel", ymargin = 6) ```

Citation

To cite the package in publications please use r citation("stabs")

which will currently give you

```r To cite package 'stabs' in publications use:

Benjamin Hofner and Torsten Hothorn (2021). stabs: Stability Selection with Error Control, R package version R package version 0.6-4, https://CRAN.R-project.org/package=stabs.

Benjamin Hofner, Luigi Boccuto and Markus Goeker (2015). Controlling false discoveries in high-dimensional situations: Boosting with stability selection. BMC Bioinformatics, 16:144. doi:10.1186/s12859-015-0575-3

To cite the stability selection for 'gamboostLSS' models use:

Thomas, J., Mayr, A., Bischl, B., Schmid, M., Smith, A., and Hofner, B. (2017). Gradient boosting for distributional regression - faster tuning and improved variable selection via noncyclical updates. Statistics and Computing. Online First. DOI 10.1007/s11222-017-9754-6

Use ‘toBibtex(citation("stabs"))’ to extract BibTeX references. ```

To obtain BibTeX references use

r toBibtex(citation("stabs"))

Owner

  • Name: Benjamin Hofner
  • Login: hofnerb
  • Kind: user
  • Location: Langen, GERMANY
  • Company: Paul-Ehrlich-Institut

Statistical Assessor and Researcher Statistician by training and with love Member of @boost-R and @openml

GitHub Events

Total
  • Issues event: 2
  • Issue comment event: 1
Last Year
  • Issues event: 2
  • Issue comment event: 1

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 128
  • Total Committers: 5
  • Avg Commits per committer: 25.6
  • Development Distribution Score (DDS): 0.039
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Benjamin Hofner b****r@f****e 123
Richard Beare R****e@g****m 2
Gökçen Eraslan g****n@g****m 1
stefan7th s****n@n****m 1
Andrey Tovchigrechko t****a@m****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 27
  • Total pull requests: 6
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 13 days
  • Total issue authors: 14
  • Total pull request authors: 3
  • Average comments per issue: 1.19
  • Average comments per pull request: 2.67
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: 2 days
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hofnerb (12)
  • ptaitAtMcMaster (2)
  • Gambleruin (2)
  • mdinh186 (1)
  • gokceneraslan (1)
  • raechin (1)
  • LeonhardU (1)
  • richardbeare (1)
  • dmresearch15 (1)
  • competulix (1)
  • richabatra (1)
  • moxgreen (1)
  • sbrockhaus (1)
  • Naeemkh (1)
Pull Request Authors
  • richardbeare (3)
  • gokceneraslan (2)
  • andreyto (1)
Top Labels
Issue Labels
enhancement (3) bug (2) question (2)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 3,082 last-month
  • Total docker downloads: 46,257
  • Total dependent packages: 10
    (may contain duplicates)
  • Total dependent repositories: 17
    (may contain duplicates)
  • Total versions: 8
  • Total maintainers: 1
cran.r-project.org: stabs

Stability Selection with Error Control

  • Versions: 6
  • Dependent Packages: 7
  • Dependent Repositories: 17
  • Downloads: 3,082 Last month
  • Docker Downloads: 46,257
Rankings
Dependent repos count: 6.9%
Downloads: 7.1%
Dependent packages count: 7.3%
Forks count: 7.3%
Stargazers count: 10.3%
Average: 10.5%
Docker downloads count: 24.2%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: r-stabs
  • Versions: 2
  • Dependent Packages: 3
  • Dependent Repositories: 0
Rankings
Dependent packages count: 15.6%
Average: 33.9%
Dependent repos count: 34.0%
Forks count: 42.2%
Stargazers count: 43.7%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 2.14.0 depends
  • methods * depends
  • parallel * depends
  • stats * depends
  • grDevices * imports
  • graphics * imports
  • utils * imports
  • TH.data * suggests
  • gamboostLSS >= 1.2 suggests
  • glmnet * suggests
  • hdi * suggests
  • knitr * suggests
  • lars * suggests
  • mboost > 2.3 suggests
  • rmarkdown * suggests
  • testthat * suggests