stacks

stacks: Stacked Ensemble Modeling with Tidy Data Principles - Published in JOSS (2022)

https://github.com/tidymodels/stacks

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords from Contributors

tidyverse

Scientific Fields

Economics Social Sciences - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

An R package for tidy stacked ensemble modeling

Basic Info

Host: GitHub
Owner: tidymodels
License: other
Language: R
Default Branch: main
Homepage: https://stacks.tidymodels.org
Size: 231 MB

Statistics

Stars: 300
Watchers: 9
Forks: 27
Open Issues: 8
Releases: 14

Created over 5 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog License Code of conduct

README.Rmd

---
output: github_document
---


[![DOI badge](https://joss.theoj.org/papers/10.21105/joss.04471/status.svg)](https://doi.org/10.21105/joss.04471)
[![R build status](https://github.com/simonpcouch/stacks/workflows/R-CMD-check/badge.svg)](https://github.com/tidymodels/stacks/actions)
[![CRAN status](https://www.r-pkg.org/badges/version/stacks)](https://CRAN.R-project.org/package=stacks)


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## stacks - tidy model stacking 

stacks is an R package for model stacking that aligns with the tidymodels. Model stacking is an ensembling method that takes the outputs of many models and combines them to generate a new model—referred to as an _ensemble_ in this package—that generates predictions informed by each of its _members_.

The process goes something like this:

1. Define candidate ensemble members using functionality from [rsample](https://rsample.tidymodels.org/), [parsnip](https://parsnip.tidymodels.org/), [workflows](https://workflows.tidymodels.org/), [recipes](https://recipes.tidymodels.org/), and [tune](https://tune.tidymodels.org/)
2. Initialize a `data_stack` object with `stacks()`  
3. Iteratively add candidate ensemble members to the `data_stack` with `add_candidates()`  
4. Evaluate how to combine their predictions with `blend_predictions()`  
5. Fit candidate ensemble members with non-zero stacking coefficients with `fit_members()`  
6. Predict on new data with `predict()`

You can install the  package with the following code:

```{r, eval = FALSE}
install.packages("stacks")
```

Install the development version with:

```{r, eval = FALSE}
# install.packages("pak")
pak::pak("tidymodels/stacks")
```

stacks is generalized with respect to:

* Model type: Any model type implemented in [parsnip](https://parsnip.tidymodels.org/) or extension packages is fair game to add to a stacks model stack. [Here](https://www.tidymodels.org/find/parsnip/)'s a table of many of the implemented model types in the tidymodels core, with a link there to an article about implementing your own model classes as well.
* Cross-validation scheme: Any resampling algorithm implemented in [rsample](https://rsample.tidymodels.org/) or extension packages is fair game for resampling data for use in training a model stack.
* Error metric: Any metric function implemented in [yardstick](https://yardstick.tidymodels.org/) or extension packages is fair game for evaluating model stacks and their members. That package provides some infrastructure for creating your own metric functions as well!

stacks uses a regularized linear model to combine predictions from ensemble members, though this model type is only one of many possible learning algorithms that could be used to fit a stacked ensemble model. For implementations of additional ensemble learning algorithms, check out [h2o](https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.stackedEnsemble.html) and [SuperLearner](https://CRAN.R-project.org/package=SuperLearner).

Rather than diving right into the implementation, we'll focus here on how the pieces fit together, conceptually, in building an ensemble with `stacks`. See the `basics` vignette for an example of the API in action!

## a grammar

At the highest level, ensembles are formed from _model definitions_. In this package, model definitions are an instance of a minimal [workflow](https://workflows.tidymodels.org/), containing a _model specification_ (as defined in the [parsnip](https://parsnip.tidymodels.org/) package) and, optionally, a _preprocessor_ (as defined in the [recipes](https://recipes.tidymodels.org/) package). Model definitions specify the form of candidate ensemble members.

![A diagram representing "model definitions," which specify the form of candidate ensemble members. Three colored boxes represent three different model types; a K-nearest neighbors model (in salmon), a linear regression model (in yellow), and a support vector machine model (in green).](man/figures/model_defs.png)

To be used in the same ensemble, each of these model definitions must share the same _resample_. This [rsample](https://rsample.tidymodels.org/) `rset` object, when paired with the model definitions, can be used to generate the tuning/fitting results objects for the candidate _ensemble members_ with tune.

![A diagram representing "candidate members" generated from each model definition. Four salmon-colored boxes labeled "KNN" represent K-nearest neighbors models trained on the resamples with differing hyperparameters. Similarly, the linear regression model generates one candidate member, and the support vector machine model generates six.](man/figures/candidates.png)

Candidate members first come together in a `data_stack` object through the `add_candidates()` function. Principally, these objects are just [tibble](https://tibble.tidyverse.org/)s, where the first column gives the true outcome in the assessment set (the portion of the training set used for model validation), and the remaining columns give the predictions from each candidate ensemble member. (When the outcome is numeric, there's only one column per candidate ensemble member. Classification requires as many columns per candidate as there are levels in the outcome variable.) They also bring along a few extra attributes to keep track of model definitions.

![A diagram representing a "data stack," a specific kind of data frame. Colored "columns" depict, in white, the true value of the outcome variable in the validation set, followed by four columns (in salmon) representing the predictions from the K-nearest neighbors model, one column (in tan) representing the linear regression model, and six (in green) representing the support vector machine model.](man/figures/data_stack.png)

Then, the data stack can be evaluated using `blend_predictions()` to determine to how best to combine the outputs from each of the candidate members.  In the stacking literature, this process is commonly called _metalearning_.

The outputs of each member are likely highly correlated. Thus, depending on the degree of regularization you choose, the coefficients for the inputs of (possibly) many of the members will zero out—their predictions will have no influence on the final output, and those terms will thus be thrown out.  

![A diagram representing "stacking coefficients," the coefficients of the linear model combining each of the candidate member predictions to generate the ensemble's ultimate prediction. Boxes for each of the candidate members are placed besides each other, filled in with color if the coefficient for the associated candidate member is nonzero.](man/figures/coefs.png)

These stacking coefficients determine which candidate ensemble members will become ensemble members. Candidates with non-zero stacking coefficients are then fitted on the whole training set, altogether making up a `model_stack` object. 

![A diagram representing the "model stack" class, which collates the stacking coefficients and members (candidate members with nonzero stacking coefficients that are trained on the full training set). The representation of the stacking coefficients is as before, where the members (shown next to their associated stacking coefficients) are colored-in pentagons. Model stacks are a list subclass.](man/figures/class_model_stack.png)

This model stack object, outputted from `fit_members()`, is ready to predict on new data! The trained ensemble members are often referred to as _base models_ in the stacking literature.

The full visual outline for these steps can be found [here](https://github.com/tidymodels/stacks/blob/main/inst/figs/outline.png). The API for the package closely mirrors these ideas. See the `basics` vignette for an example of how this grammar is implemented!

## contributing

This project is released with a [Contributor Code of Conduct](https://github.com/tidymodels/stacks/blob/main/.github/CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on Posit Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/stacks/issues).

- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.

- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).

In the stacks package, some test objects take too long to build with every commit. If your contribution changes the structure of `data_stack` or `model_stacks` objects, please regenerate these test objects by running the scripts in `man-roxygen/example_models.Rmd`, including those with chunk options `eval = FALSE`.

Owner

Name: tidymodels
Login: tidymodels
Kind: organization

Repositories: 59
Profile: https://github.com/tidymodels

JOSS Publication

stacks: Stacked Ensemble Modeling with Tidy Data Principles

Published

July 06, 2022

DOI

10.21105/joss.04471

Volume 7, Issue 75, Page 4471

Authors

Simon P. Couch

RStudio PBC

Max Kuhn
RStudio PBC

Editor

Øystein Sørensen

Papers & Mentions

Total mentions: 22

Host plant‐related genomic differentiation in the European cherry fruit fly, Rhagoletis cerasi

DOI: 10.1111/mec.15239
OpenAlex ID: https://openalex.org/W2972963679
Published: October 2019

Last synced: 4 months ago

Bs<scp>RAD</scp>seq: screening <scp>DNA</scp> methylation in natural populations of non‐model species

DOI: 10.1111/mec.13550
OpenAlex ID: https://openalex.org/W2344301702
Published: March 2016

Last synced: 4 months ago

Chromosome‐scale assembly of the genome of Salixdunnii reveals a male‐heterogametic sex determination system on chromosome 7

DOI: 10.1111/1755-0998.13362
OpenAlex ID: https://openalex.org/W3129396171
Published: March 2021

Last synced: 4 months ago

A genomic approach to inferring kinship reveals limited intergenerational dispersal in the yellow fever mosquito

DOI: 10.1111/1755-0998.13043
OpenAlex ID: https://openalex.org/W2949584696
Published: June 2019

Last synced: 4 months ago

An Africa-wide genomic evolution of insecticide resistance in the malaria vector Anopheles funestus involves selective sweeps, copy number variations, gene conversion and transposons

DOI: 10.1371/journal.pgen.1008822
OpenAlex ID: https://openalex.org/W3032993744
Published: June 2020

Last synced: 4 months ago

Rapid SNP Discovery and a RAD-Based High-Density Linkage Map in Jujube (Ziziphus Mill.)

DOI: 10.1371/journal.pone.0109850
OpenAlex ID: https://openalex.org/W2011809951
Published: October 2014

Last synced: 4 months ago

Fairy circles in Namibia are assembled from genetically distinct grasses

DOI: 10.1038/s42003-020-01431-0
OpenAlex ID: https://openalex.org/W3102331553
Published: November 2020

Last synced: 4 months ago

Outlier analyses to test for local adaptation to breeding grounds in a migratory arctic seabird

DOI: 10.1002/ece3.2819
OpenAlex ID: https://openalex.org/W2593316148
Published: March 2017

Last synced: 4 months ago

Phylogeography and population genetics of pine butterflies: Sky islands increase genetic divergence

DOI: 10.1002/ece3.5793
OpenAlex ID: https://openalex.org/W2989268571
Published: November 2019

Last synced: 4 months ago

Sex matters: Otolith shape and genomic variation in deacon rockfish (Sebastes diaconus)

DOI: 10.1002/ece3.5763
OpenAlex ID: https://openalex.org/W2983128931
Published: November 2019

Last synced: 4 months ago

Insights into the neutral and adaptive processes shaping the spatial distribution of genomic variation in the economically important Moroccan locust ( Dociostaurus maroccanus )

DOI: 10.1002/ece3.6165
OpenAlex ID: https://openalex.org/W3014272654
Published: March 2020

Last synced: 4 months ago

Rattuspopulation genomics across the Haida Gwaii archipelago provides a framework for guiding invasive species management

DOI: 10.1111/eva.12907
OpenAlex ID: https://openalex.org/W2996456721
Published: January 2020

Last synced: 4 months ago

Genome‐wide diversity and habitat underlie fine‐scale phenotypic differentiation in the rainbow darter ( Etheostoma caeruleum )

DOI: 10.1111/eva.13135
OpenAlex ID: https://openalex.org/W3087954769
Published: October 2020

Last synced: 4 months ago

Applying landscape genomic tools to forest management and restoration of Hawaiian koa (Acacia koa) in a changing environment

DOI: 10.1111/eva.12534
OpenAlex ID: https://openalex.org/W2750381744
Published: September 2017

Last synced: 4 months ago

Contrasting genetic structure between mitochondrial and nuclear markers in the dengue fever mosquito from Rio de Janeiro: implications for vector control

DOI: 10.1111/eva.12301
OpenAlex ID: https://openalex.org/W1962735074
Published: September 2015

Last synced: 4 months ago

Adaptive markers distinguish North and South Pacific Albacore amid low population differentiation

DOI: 10.1111/eva.13202
OpenAlex ID: https://openalex.org/W3126819064
Published: February 2021

Last synced: 4 months ago

Population history provides foundational knowledge for utilizing and developing native plant restoration materials

DOI: 10.1111/eva.12704
OpenAlex ID: https://openalex.org/W2888885225
Published: September 2018

Last synced: 4 months ago

The genomic basis of cichlid fish adaptation within the deepwater “twilight zone” of Lake Malawi

DOI: 10.1002/evl3.20
OpenAlex ID: https://openalex.org/W2752154464
Published: September 2017

Last synced: 4 months ago

Construction of ddRADseq-Based High-Density Genetic Map and Identification of Quantitative Trait Loci for Trans-resveratrol Content in Peanut Seeds

DOI: 10.3389/fpls.2021.644402
OpenAlex ID: https://openalex.org/W3137615823
Published: March 2021

Last synced: 4 months ago

Salmonid chromosome evolution as revealed by a novel method for comparing RADseq linkage maps

DOI: 10.1093/gbe/evw262
OpenAlex ID: https://openalex.org/W2404399045
Published: November 2016

Last synced: 4 months ago

Extreme mito-nuclear discordance in a peninsular lizard: the role of drift, selection, and climate

DOI: 10.1038/s41437-019-0204-4
OpenAlex ID: https://openalex.org/W2918411622
Published: March 2019

Last synced: 4 months ago

Population genomics and conservation management of a declining tropical rodent

DOI: 10.1038/s41437-021-00418-9
OpenAlex ID: https://openalex.org/W3134429773
Published: March 2021

Last synced: 4 months ago

GitHub Events

Total

Create event: 7
Release event: 2
Issues event: 12
Watch event: 6
Delete event: 3
Issue comment event: 2
Push event: 25
Pull request review comment event: 2
Pull request review event: 3
Pull request event: 7

Last Year

Create event: 7
Release event: 2
Issues event: 12
Watch event: 6
Delete event: 3
Issue comment event: 2
Push event: 25
Pull request review comment event: 2
Pull request review event: 3
Pull request event: 7

Committers

Last synced: 7 months ago

All Time

Total Commits: 670
Total Committers: 8
Avg Commits per committer: 83.75
Development Distribution Score (DDS): 0.061

Past Year

Commits: 16
Committers: 2
Avg Commits per committer: 8.0
Development Distribution Score (DDS): 0.063

Top Committers

Name	Email	Commits
Simon P. Couch	s**h@g**m	629
Max Kuhn	m**n@g**m	32
Emil Hvitfeldt	e**t@g**m	3
Hannah Frick	h**h@p**o	2
Øystein Sørensen	o**n@h**m	1
asmae-toumi	a**u@g**m	1
Joscelin Rocha Hidalgo	j**a@g**m	1
Gábor Csárdi	c**r@g**m	1

Committer Domains (Top 20 + Academic)

posit.co: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 81
Total pull requests: 61
Average time to close issues: about 1 month
Average time to close pull requests: 13 days
Total issue authors: 26
Total pull request authors: 5
Average comments per issue: 1.89
Average comments per pull request: 0.95
Merged pull requests: 57
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 13
Pull requests: 8
Average time to close issues: 1 day
Average time to close pull requests: about 16 hours
Issue authors: 6
Pull request authors: 3
Average comments per issue: 0.62
Average comments per pull request: 0.13
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

simonpcouch (49)
juliasilge (3)
TSI-PTG (2)
cgoo4 (2)
osorensen (2)
topepo (2)
Joscelinrocha (1)
frankiethull (1)
rcannood (1)
Saarialho (1)
rdavis120 (1)
amin0511ss (1)
mcavs (1)
JoeSydlowski (1)
pbulsink (1)

Pull Request Authors

simonpcouch (67)
Joscelinrocha (3)
gaborcsardi (2)
EmilHvitfeldt (1)
osorensen (1)

Top Labels

Issue Labels

upkeep (4) documentation 📜 (2) feature (2) bug (1) tidy-dev-day :nerd_face: (1) documentation (1)

Pull Request Labels

Packages

Total packages: 3
Total downloads:
- cran 2,292 last-month
Total docker downloads: 8

Total dependent packages: 5
(may contain duplicates)
Total dependent repositories: 19
(may contain duplicates)
Total versions: 33
Total maintainers: 1

proxy.golang.org: github.com/tidymodels/stacks

Documentation: https://pkg.go.dev/github.com/tidymodels/stacks#section-documentation
License: other
Latest release: v1.1.1
published 9 months ago

Versions: 14
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 6 months ago

cran.r-project.org: stacks

Tidy Model Stacking

Homepage: https://stacks.tidymodels.org/
Documentation: http://cran.r-project.org/web/packages/stacks/stacks.pdf
License: MIT + file LICENSE
Latest release: 1.1.1
published 9 months ago

Versions: 13
Dependent Packages: 5
Dependent Repositories: 18
Downloads: 2,292 Last month
Docker Downloads: 8

Rankings

Stargazers count: 1.4%

Forks count: 3.3%

Dependent repos count: 6.7%

Downloads: 8.9%

Average: 9.7%

Dependent packages count: 10.9%

Docker downloads count: 27.4%

Maintainers (1)

simon.couch@posit.co

Last synced: 6 months ago

conda-forge.org: r-stacks

Homepage: https://stacks.tidymodels.org/
License: MIT
Latest release: 1.0.0
published over 3 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 1

Rankings

Dependent repos count: 24.3%

Average: 37.9%

Dependent packages count: 51.6%

Last synced: 6 months ago

Dependencies

.github/workflows/R-CMD-check-hard.yaml actions

actions/checkout v2 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/R-CMD-check.yaml actions

actions/checkout v2 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/lock.yaml actions

dessant/lock-threads v2 composite

.github/workflows/pkgdown.yaml actions

JamesIves/github-pages-deploy-action 4.1.4 composite
actions/checkout v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/test-coverage.yaml actions

actions/checkout v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

DESCRIPTION cran

R >= 2.10 depends
butcher >= 0.1.3 imports
cli * imports
dplyr >= 1.0.99.9000 imports
foreach * imports
generics * imports
ggplot2 * imports
glmnet * imports
glue * imports
parsnip >= 1.0.2 imports
purrr >= 0.3.2 imports
recipes >= 0.2.0 imports
rlang >= 0.4.0 imports
rsample >= 0.1.1 imports
stats * imports
tibble >= 2.1.3 imports
tidyr * imports
tune >= 0.1.3 imports
workflows >= 0.2.3 imports
workflowsets >= 0.1.0 imports
yardstick * imports
SuperLearner * suggests
covr * suggests
h2o * suggests
kernlab * suggests
kknn * suggests
knitr * suggests
mockr * suggests
modeldata * suggests
nnet * suggests
ranger * suggests
rmarkdown * suggests
testthat >= 3.0.0 suggests

stacks

Science Score: 93.0%

Keywords from Contributors

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

JOSS Publication

stacks: Stacked Ensemble Modeling with Tidy Data Principles

Authors

Editor

Tags

Papers & Mentions

Host plant‐related genomic differentiation in the European cherry fruit fly, <i>Rhagoletis cerasi</i>

Bs<scp>RAD</scp>seq: screening <scp>DNA</scp> methylation in natural populations of non‐model species

Chromosome‐scale assembly of the genome of <i>Salix</i><i>dunnii</i> reveals a male‐heterogametic sex determination system on chromosome 7

A genomic approach to inferring kinship reveals limited intergenerational dispersal in the yellow fever mosquito

An Africa-wide genomic evolution of insecticide resistance in the malaria vector Anopheles funestus involves selective sweeps, copy number variations, gene conversion and transposons

Rapid SNP Discovery and a RAD-Based High-Density Linkage Map in Jujube (Ziziphus Mill.)

Fairy circles in Namibia are assembled from genetically distinct grasses

Outlier analyses to test for local adaptation to breeding grounds in a migratory arctic seabird

Phylogeography and population genetics of pine butterflies: Sky islands increase genetic divergence

Sex matters: Otolith shape and genomic variation in deacon rockfish (<i>Sebastes diaconus</i>)

Insights into the neutral and adaptive processes shaping the spatial distribution of genomic variation in the economically important Moroccan locust ( <i>Dociostaurus maroccanus</i> )

<i>Rattus</i>population genomics across the Haida Gwaii archipelago provides a framework for guiding invasive species management

Genome‐wide diversity and habitat underlie fine‐scale phenotypic differentiation in the rainbow darter ( <i>Etheostoma caeruleum</i> )

Applying landscape genomic tools to forest management and restoration of Hawaiian koa (<i>Acacia koa</i>) in a changing environment

Contrasting genetic structure between mitochondrial and nuclear markers in the dengue fever mosquito from Rio de Janeiro: implications for vector control

Adaptive markers distinguish North and South Pacific Albacore amid low population differentiation

Population history provides foundational knowledge for utilizing and developing native plant restoration materials

The genomic basis of cichlid fish adaptation within the deepwater “twilight zone” of Lake Malawi

Construction of ddRADseq-Based High-Density Genetic Map and Identification of Quantitative Trait Loci for Trans-resveratrol Content in Peanut Seeds

Salmonid chromosome evolution as revealed by a novel method for comparing RADseq linkage maps

Extreme mito-nuclear discordance in a peninsular lizard: the role of drift, selection, and climate

Population genomics and conservation management of a declining tropical rodent

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/tidymodels/stacks

Rankings

cran.r-project.org: stacks

Rankings

Maintainers (1)

conda-forge.org: r-stacks

Rankings

Dependencies