stacks
stacks: Stacked Ensemble Modeling with Tidy Data Principles - Published in JOSS (2022)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords from Contributors
tidyverse
Scientific Fields
Economics
Social Sciences -
40% confidence
Last synced: 4 months ago
·
JSON representation
Repository
An R package for tidy stacked ensemble modeling
Basic Info
- Host: GitHub
- Owner: tidymodels
- License: other
- Language: R
- Default Branch: main
- Homepage: https://stacks.tidymodels.org
- Size: 231 MB
Statistics
- Stars: 300
- Watchers: 9
- Forks: 27
- Open Issues: 8
- Releases: 14
Created over 5 years ago
· Last pushed 5 months ago
Metadata Files
Readme
Changelog
License
Code of conduct
README.Rmd
---
output: github_document
---
[](https://doi.org/10.21105/joss.04471)
[](https://github.com/tidymodels/stacks/actions)
[](https://CRAN.R-project.org/package=stacks)
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## stacks - tidy model stacking
stacks is an R package for model stacking that aligns with the tidymodels. Model stacking is an ensembling method that takes the outputs of many models and combines them to generate a new model—referred to as an _ensemble_ in this package—that generates predictions informed by each of its _members_.
The process goes something like this:
1. Define candidate ensemble members using functionality from [rsample](https://rsample.tidymodels.org/), [parsnip](https://parsnip.tidymodels.org/), [workflows](https://workflows.tidymodels.org/), [recipes](https://recipes.tidymodels.org/), and [tune](https://tune.tidymodels.org/)
2. Initialize a `data_stack` object with `stacks()`
3. Iteratively add candidate ensemble members to the `data_stack` with `add_candidates()`
4. Evaluate how to combine their predictions with `blend_predictions()`
5. Fit candidate ensemble members with non-zero stacking coefficients with `fit_members()`
6. Predict on new data with `predict()`
You can install the package with the following code:
```{r, eval = FALSE}
install.packages("stacks")
```
Install the development version with:
```{r, eval = FALSE}
# install.packages("pak")
pak::pak("tidymodels/stacks")
```
stacks is generalized with respect to:
* Model type: Any model type implemented in [parsnip](https://parsnip.tidymodels.org/) or extension packages is fair game to add to a stacks model stack. [Here](https://www.tidymodels.org/find/parsnip/)'s a table of many of the implemented model types in the tidymodels core, with a link there to an article about implementing your own model classes as well.
* Cross-validation scheme: Any resampling algorithm implemented in [rsample](https://rsample.tidymodels.org/) or extension packages is fair game for resampling data for use in training a model stack.
* Error metric: Any metric function implemented in [yardstick](https://yardstick.tidymodels.org/) or extension packages is fair game for evaluating model stacks and their members. That package provides some infrastructure for creating your own metric functions as well!
stacks uses a regularized linear model to combine predictions from ensemble members, though this model type is only one of many possible learning algorithms that could be used to fit a stacked ensemble model. For implementations of additional ensemble learning algorithms, check out [h2o](https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.stackedEnsemble.html) and [SuperLearner](https://CRAN.R-project.org/package=SuperLearner).
Rather than diving right into the implementation, we'll focus here on how the pieces fit together, conceptually, in building an ensemble with `stacks`. See the `basics` vignette for an example of the API in action!
## a grammar
At the highest level, ensembles are formed from _model definitions_. In this package, model definitions are an instance of a minimal [workflow](https://workflows.tidymodels.org/), containing a _model specification_ (as defined in the [parsnip](https://parsnip.tidymodels.org/) package) and, optionally, a _preprocessor_ (as defined in the [recipes](https://recipes.tidymodels.org/) package). Model definitions specify the form of candidate ensemble members.

To be used in the same ensemble, each of these model definitions must share the same _resample_. This [rsample](https://rsample.tidymodels.org/) `rset` object, when paired with the model definitions, can be used to generate the tuning/fitting results objects for the candidate _ensemble members_ with tune.

Candidate members first come together in a `data_stack` object through the `add_candidates()` function. Principally, these objects are just [tibble](https://tibble.tidyverse.org/)s, where the first column gives the true outcome in the assessment set (the portion of the training set used for model validation), and the remaining columns give the predictions from each candidate ensemble member. (When the outcome is numeric, there's only one column per candidate ensemble member. Classification requires as many columns per candidate as there are levels in the outcome variable.) They also bring along a few extra attributes to keep track of model definitions.

Then, the data stack can be evaluated using `blend_predictions()` to determine to how best to combine the outputs from each of the candidate members. In the stacking literature, this process is commonly called _metalearning_.
The outputs of each member are likely highly correlated. Thus, depending on the degree of regularization you choose, the coefficients for the inputs of (possibly) many of the members will zero out—their predictions will have no influence on the final output, and those terms will thus be thrown out.

These stacking coefficients determine which candidate ensemble members will become ensemble members. Candidates with non-zero stacking coefficients are then fitted on the whole training set, altogether making up a `model_stack` object.

This model stack object, outputted from `fit_members()`, is ready to predict on new data! The trained ensemble members are often referred to as _base models_ in the stacking literature.
The full visual outline for these steps can be found [here](https://github.com/tidymodels/stacks/blob/main/inst/figs/outline.png). The API for the package closely mirrors these ideas. See the `basics` vignette for an example of how this grammar is implemented!
## contributing
This project is released with a [Contributor Code of Conduct](https://github.com/tidymodels/stacks/blob/main/.github/CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.
- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on Posit Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).
- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/stacks/issues).
- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.
- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).
In the stacks package, some test objects take too long to build with every commit. If your contribution changes the structure of `data_stack` or `model_stacks` objects, please regenerate these test objects by running the scripts in `man-roxygen/example_models.Rmd`, including those with chunk options `eval = FALSE`.
Owner
- Name: tidymodels
- Login: tidymodels
- Kind: organization
- Repositories: 59
- Profile: https://github.com/tidymodels
JOSS Publication
stacks: Stacked Ensemble Modeling with Tidy Data Principles
Published
July 06, 2022
Volume 7, Issue 75, Page 4471
Tags
data science tidyverse model stacking ensemblingPapers & Mentions
Total mentions: 22
Host plant‐related genomic differentiation in the European cherry fruit fly, <i>Rhagoletis cerasi</i>
- DOI: 10.1111/mec.15239
- OpenAlex ID: https://openalex.org/W2972963679
- Published: October 2019
Last synced: 3 months ago
Bs<scp>RAD</scp>seq: screening <scp>DNA</scp> methylation in natural populations of non‐model species
- DOI: 10.1111/mec.13550
- OpenAlex ID: https://openalex.org/W2344301702
- Published: March 2016
Last synced: 3 months ago
Chromosome‐scale assembly of the genome of <i>Salix</i><i>dunnii</i> reveals a male‐heterogametic sex determination system on chromosome 7
- DOI: 10.1111/1755-0998.13362
- OpenAlex ID: https://openalex.org/W3129396171
- Published: March 2021
Last synced: 3 months ago
A genomic approach to inferring kinship reveals limited intergenerational dispersal in the yellow fever mosquito
- DOI: 10.1111/1755-0998.13043
- OpenAlex ID: https://openalex.org/W2949584696
- Published: June 2019
Last synced: 3 months ago
An Africa-wide genomic evolution of insecticide resistance in the malaria vector Anopheles funestus involves selective sweeps, copy number variations, gene conversion and transposons
- DOI: 10.1371/journal.pgen.1008822
- OpenAlex ID: https://openalex.org/W3032993744
- Published: June 2020
Last synced: 3 months ago
Rapid SNP Discovery and a RAD-Based High-Density Linkage Map in Jujube (Ziziphus Mill.)
- DOI: 10.1371/journal.pone.0109850
- OpenAlex ID: https://openalex.org/W2011809951
- Published: October 2014
Last synced: 3 months ago
Fairy circles in Namibia are assembled from genetically distinct grasses
- DOI: 10.1038/s42003-020-01431-0
- OpenAlex ID: https://openalex.org/W3102331553
- Published: November 2020
Last synced: 3 months ago
Outlier analyses to test for local adaptation to breeding grounds in a migratory arctic seabird
- DOI: 10.1002/ece3.2819
- OpenAlex ID: https://openalex.org/W2593316148
- Published: March 2017
Last synced: 3 months ago
Phylogeography and population genetics of pine butterflies: Sky islands increase genetic divergence
- DOI: 10.1002/ece3.5793
- OpenAlex ID: https://openalex.org/W2989268571
- Published: November 2019
Last synced: 3 months ago
Sex matters: Otolith shape and genomic variation in deacon rockfish (<i>Sebastes diaconus</i>)
- DOI: 10.1002/ece3.5763
- OpenAlex ID: https://openalex.org/W2983128931
- Published: November 2019
Last synced: 3 months ago
Insights into the neutral and adaptive processes shaping the spatial distribution of genomic variation in the economically important Moroccan locust ( <i>Dociostaurus maroccanus</i> )
- DOI: 10.1002/ece3.6165
- OpenAlex ID: https://openalex.org/W3014272654
- Published: March 2020
Last synced: 3 months ago
<i>Rattus</i>population genomics across the Haida Gwaii archipelago provides a framework for guiding invasive species management
- DOI: 10.1111/eva.12907
- OpenAlex ID: https://openalex.org/W2996456721
- Published: January 2020
Last synced: 3 months ago
Genome‐wide diversity and habitat underlie fine‐scale phenotypic differentiation in the rainbow darter ( <i>Etheostoma caeruleum</i> )
- DOI: 10.1111/eva.13135
- OpenAlex ID: https://openalex.org/W3087954769
- Published: October 2020
Last synced: 3 months ago
Applying landscape genomic tools to forest management and restoration of Hawaiian koa (<i>Acacia koa</i>) in a changing environment
- DOI: 10.1111/eva.12534
- OpenAlex ID: https://openalex.org/W2750381744
- Published: September 2017
Last synced: 3 months ago
Contrasting genetic structure between mitochondrial and nuclear markers in the dengue fever mosquito from Rio de Janeiro: implications for vector control
- DOI: 10.1111/eva.12301
- OpenAlex ID: https://openalex.org/W1962735074
- Published: September 2015
Last synced: 3 months ago
Adaptive markers distinguish North and South Pacific Albacore amid low population differentiation
- DOI: 10.1111/eva.13202
- OpenAlex ID: https://openalex.org/W3126819064
- Published: February 2021
Last synced: 3 months ago
Population history provides foundational knowledge for utilizing and developing native plant restoration materials
- DOI: 10.1111/eva.12704
- OpenAlex ID: https://openalex.org/W2888885225
- Published: September 2018
Last synced: 3 months ago
The genomic basis of cichlid fish adaptation within the deepwater “twilight zone” of Lake Malawi
- DOI: 10.1002/evl3.20
- OpenAlex ID: https://openalex.org/W2752154464
- Published: September 2017
Last synced: 3 months ago
Construction of ddRADseq-Based High-Density Genetic Map and Identification of Quantitative Trait Loci for Trans-resveratrol Content in Peanut Seeds
- DOI: 10.3389/fpls.2021.644402
- OpenAlex ID: https://openalex.org/W3137615823
- Published: March 2021
Last synced: 3 months ago
Salmonid chromosome evolution as revealed by a novel method for comparing RADseq linkage maps
- DOI: 10.1093/gbe/evw262
- OpenAlex ID: https://openalex.org/W2404399045
- Published: November 2016
Last synced: 3 months ago
Extreme mito-nuclear discordance in a peninsular lizard: the role of drift, selection, and climate
- DOI: 10.1038/s41437-019-0204-4
- OpenAlex ID: https://openalex.org/W2918411622
- Published: March 2019
Last synced: 3 months ago
Population genomics and conservation management of a declining tropical rodent
- DOI: 10.1038/s41437-021-00418-9
- OpenAlex ID: https://openalex.org/W3134429773
- Published: March 2021
Last synced: 3 months ago
GitHub Events
Total
- Create event: 7
- Release event: 2
- Issues event: 12
- Watch event: 6
- Delete event: 3
- Issue comment event: 2
- Push event: 25
- Pull request review comment event: 2
- Pull request review event: 3
- Pull request event: 7
Last Year
- Create event: 7
- Release event: 2
- Issues event: 12
- Watch event: 6
- Delete event: 3
- Issue comment event: 2
- Push event: 25
- Pull request review comment event: 2
- Pull request review event: 3
- Pull request event: 7
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Simon P. Couch | s****h@g****m | 629 |
| Max Kuhn | m****n@g****m | 32 |
| Emil Hvitfeldt | e****t@g****m | 3 |
| Hannah Frick | h****h@p****o | 2 |
| Øystein Sørensen | o****n@h****m | 1 |
| asmae-toumi | a****u@g****m | 1 |
| Joscelin Rocha Hidalgo | j****a@g****m | 1 |
| Gábor Csárdi | c****r@g****m | 1 |
Committer Domains (Top 20 + Academic)
posit.co: 1
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 81
- Total pull requests: 61
- Average time to close issues: about 1 month
- Average time to close pull requests: 13 days
- Total issue authors: 26
- Total pull request authors: 5
- Average comments per issue: 1.89
- Average comments per pull request: 0.95
- Merged pull requests: 57
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 13
- Pull requests: 8
- Average time to close issues: 1 day
- Average time to close pull requests: about 16 hours
- Issue authors: 6
- Pull request authors: 3
- Average comments per issue: 0.62
- Average comments per pull request: 0.13
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- simonpcouch (49)
- juliasilge (3)
- TSI-PTG (2)
- cgoo4 (2)
- osorensen (2)
- topepo (2)
- Joscelinrocha (1)
- frankiethull (1)
- rcannood (1)
- Saarialho (1)
- rdavis120 (1)
- amin0511ss (1)
- mcavs (1)
- JoeSydlowski (1)
- pbulsink (1)
Pull Request Authors
- simonpcouch (67)
- Joscelinrocha (3)
- gaborcsardi (2)
- EmilHvitfeldt (1)
- osorensen (1)
Top Labels
Issue Labels
upkeep (4)
documentation 📜 (2)
feature (2)
bug (1)
tidy-dev-day :nerd_face: (1)
documentation (1)
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- cran 2,292 last-month
- Total docker downloads: 8
-
Total dependent packages: 5
(may contain duplicates) -
Total dependent repositories: 19
(may contain duplicates) - Total versions: 33
- Total maintainers: 1
proxy.golang.org: github.com/tidymodels/stacks
- Documentation: https://pkg.go.dev/github.com/tidymodels/stacks#section-documentation
- License: other
-
Latest release: v1.1.1
published 7 months ago
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced:
4 months ago
cran.r-project.org: stacks
Tidy Model Stacking
- Homepage: https://stacks.tidymodels.org/
- Documentation: http://cran.r-project.org/web/packages/stacks/stacks.pdf
- License: MIT + file LICENSE
-
Latest release: 1.1.1
published 7 months ago
Rankings
Stargazers count: 1.4%
Forks count: 3.3%
Dependent repos count: 6.7%
Downloads: 8.9%
Average: 9.7%
Dependent packages count: 10.9%
Docker downloads count: 27.4%
Maintainers (1)
Last synced:
4 months ago
conda-forge.org: r-stacks
- Homepage: https://stacks.tidymodels.org/
- License: MIT
-
Latest release: 1.0.0
published over 3 years ago
Rankings
Dependent repos count: 24.3%
Average: 37.9%
Dependent packages count: 51.6%
Last synced:
4 months ago
Dependencies
.github/workflows/R-CMD-check-hard.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/lock.yaml
actions
- dessant/lock-threads v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action 4.1.4 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 2.10 depends
- butcher >= 0.1.3 imports
- cli * imports
- dplyr >= 1.0.99.9000 imports
- foreach * imports
- generics * imports
- ggplot2 * imports
- glmnet * imports
- glue * imports
- parsnip >= 1.0.2 imports
- purrr >= 0.3.2 imports
- recipes >= 0.2.0 imports
- rlang >= 0.4.0 imports
- rsample >= 0.1.1 imports
- stats * imports
- tibble >= 2.1.3 imports
- tidyr * imports
- tune >= 0.1.3 imports
- workflows >= 0.2.3 imports
- workflowsets >= 0.1.0 imports
- yardstick * imports
- SuperLearner * suggests
- covr * suggests
- h2o * suggests
- kernlab * suggests
- kknn * suggests
- knitr * suggests
- mockr * suggests
- modeldata * suggests
- nnet * suggests
- ranger * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests
