parsnip

A tidy unified interface to models

https://github.com/tidymodels/parsnip

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 51 committers (3.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (19.3%) to scientific vocabulary

Keywords from Contributors

tidy-data package-creation tidyverse data-manipulation grammar date-time network-analysis odbc pandoc reproducibility
Last synced: 10 months ago · JSON representation

Repository

A tidy unified interface to models

Basic Info
Statistics
  • Stars: 626
  • Watchers: 26
  • Forks: 94
  • Open Issues: 94
  • Releases: 31
Created over 8 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# parsnip a drawing of a parsnip on a beige background


[![R-CMD-check](https://github.com/tidymodels/parsnip/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidymodels/parsnip/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/parsnip/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/parsnip?branch=main)
[![CRAN status](https://www.r-pkg.org/badges/version/parsnip)](https://CRAN.R-project.org/package=parsnip)
[![Downloads](https://cranlogs.r-pkg.org/badges/parsnip)](https://CRAN.R-project.org/package=parsnip)
[![lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![Codecov test coverage](https://codecov.io/gh/tidymodels/parsnip/graph/badge.svg)](https://app.codecov.io/gh/tidymodels/parsnip)


## Introduction

The goal of parsnip is to provide a tidy, unified interface to models that can be used to try a range of models without getting bogged down in the syntactical minutiae of the underlying packages. 

## Installation

```{r, eval = FALSE}
# The easiest way to get parsnip is to install all of tidymodels:
install.packages("tidymodels")

# Alternatively, install just parsnip:
install.packages("parsnip")

# Or the development version from GitHub:
# install.packages("pak")
pak::pak("tidymodels/parsnip")
```


## Getting started

One challenge with different modeling functions available in R _that do the same thing_ is that they can have different interfaces and arguments. For example, to fit a random forest regression model, we might have:

```{r eval = FALSE}
# From randomForest
rf_1 <- randomForest(
  y ~ ., 
  data = dat, 
  mtry = 10, 
  ntree = 2000, 
  importance = TRUE
)

# From ranger
rf_2 <- ranger(
  y ~ ., 
  data = dat, 
  mtry = 10, 
  num.trees = 2000, 
  importance = "impurity"
)

# From sparklyr
rf_3 <- ml_random_forest(
  dat, 
  intercept = FALSE, 
  response = "y", 
  features = names(dat)[names(dat) != "y"], 
  col.sample.rate = 10,
  num.trees = 2000
)
```

Note that the model syntax can be very different and that the argument names (and formats) are also different. This is a pain if you switch between implementations. 

In this example: 

* the **type** of model is "random forest", 
* the **mode** of the model is "regression" (as opposed to classification, etc), and 
* the computational **engine** is the name of the R package. 


The goals of parsnip are to:

* Separate the definition of a model from its evaluation.
* Decouple the model specification from the implementation (whether the implementation is in R, spark, or something else). For example, the user would call `rand_forest` instead of `ranger::ranger` or other specific packages. 
* Harmonize argument names (e.g. `n.trees`, `ntrees`, `trees`) so that users only need to remember a single name. This will help _across_ model types too so that `trees` will be the same argument across random forest as well as boosting or bagging. 

Using the example above, the parsnip approach would be:

```{r}
library(parsnip)

rand_forest(mtry = 10, trees = 2000) |>
  set_engine("ranger", importance = "impurity") |>
  set_mode("regression")
```

The engine can be easily changed. To use Spark, the change is straightforward:

```{r}
rand_forest(mtry = 10, trees = 2000) |>
  set_engine("spark") |>
  set_mode("regression")
```

Either one of these model specifications can be fit in the same way:

```{r}
set.seed(192)
rand_forest(mtry = 10, trees = 2000) |>
  set_engine("ranger", importance = "impurity") |>
  set_mode("regression") |>
  fit(mpg ~ ., data = mtcars)
```

A list of all parsnip models across different CRAN packages can be found at https://www.tidymodels.org/find/parsnip/.

## Contributing

This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and machine learning, please [post on RStudio Community](https://forum.posit.co/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an issue](https://github.com/tidymodels/parsnip/issues).

- Either way, learn how to create and share a [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) (a minimal, reproducible example), to clearly communicate about your code.

- Check out further details on [contributing guidelines for tidymodels packages](https://www.tidymodels.org/contribute/) and [how to get help](https://www.tidymodels.org/help/).

Owner

  • Name: tidymodels
  • Login: tidymodels
  • Kind: organization

GitHub Events

Total
  • Create event: 35
  • Release event: 3
  • Issues event: 54
  • Watch event: 28
  • Delete event: 29
  • Issue comment event: 130
  • Push event: 121
  • Pull request review event: 39
  • Pull request review comment event: 52
  • Pull request event: 49
  • Fork event: 7
Last Year
  • Create event: 35
  • Release event: 3
  • Issues event: 54
  • Watch event: 28
  • Delete event: 29
  • Issue comment event: 130
  • Push event: 121
  • Pull request review event: 39
  • Pull request review comment event: 52
  • Pull request event: 49
  • Fork event: 7

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 1,695
  • Total Committers: 51
  • Avg Commits per committer: 33.235
  • Development Distribution Score (DDS): 0.496
Past Year
  • Commits: 119
  • Committers: 13
  • Avg Commits per committer: 9.154
  • Development Distribution Score (DDS): 0.546
Top Committers
Name Email Commits
Max Kuhn m****n@g****m 855
Julia Silge j****e@g****m 183
Emil Hvitfeldt e****t@g****m 182
Hannah Frick h****h@r****m 167
Simon P. Couch s****h@g****m 163
DavisVaughan d****s@r****m 46
Patrick Miller p****r@c****m 20
Malcolm Barrett m****t@g****m 6
Qiushi Yan q****n@g****m 6
Mine Çetinkaya-Rundel c****e@g****m 5
Rory Nolan r****n@g****m 4
Max Kuhn m****x@i****l 4
‘topepo’ ‘****n@g****’ 3
Gray g****o@g****m 3
Steven Pawley d****y@g****m 3
Max Kuhn m****x@i****t 2
irkaal i****v@g****m 2
artichaud1 k****i@h****m 2
Y. Yu 5****e 2
Steve Hummel 4****1 2
Omi Johnson o****n@b****g 2
Matt Dancho m****o@g****m 2
Kyle Scott k****9@m****u 2
Byron b****r@g****m 2
Tomasz Kalinowski k****t@g****m 1
Tiago Maié t****e@h****m 1
Tan Ho 3****3 1
StefanBRas 2****s 1
Jonathan Marshall j****l@m****z 1
Paige Bailey p****y@m****m 1
and 21 more...

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 234
  • Total pull requests: 290
  • Average time to close issues: 6 months
  • Average time to close pull requests: 11 days
  • Total issue authors: 76
  • Total pull request authors: 16
  • Average comments per issue: 1.72
  • Average comments per pull request: 1.14
  • Merged pull requests: 255
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 59
  • Pull requests: 82
  • Average time to close issues: 2 days
  • Average time to close pull requests: 12 days
  • Issue authors: 14
  • Pull request authors: 6
  • Average comments per issue: 0.36
  • Average comments per pull request: 0.87
  • Merged pull requests: 71
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • simonpcouch (41)
  • hfrick (34)
  • EmilHvitfeldt (34)
  • topepo (32)
  • chillerb (7)
  • jxu (4)
  • marcelglueck (3)
  • juliasilge (3)
  • tolliam (2)
  • SHo-JANG (2)
  • joscani (2)
  • Freestyleyang (2)
  • exsell-jc (2)
  • ZWael (2)
  • cb12991 (1)
Pull Request Authors
  • simonpcouch (105)
  • topepo (63)
  • EmilHvitfeldt (57)
  • hfrick (37)
  • shum461 (5)
  • kscott-1 (4)
  • dajmcdon (3)
  • JamesHWade (2)
  • RodDalBen (2)
  • RobLBaker (2)
  • luisDVA (2)
  • bcjaeger (2)
  • gaborcsardi (2)
  • corybrunson (2)
  • gmcmacran (1)
Top Labels
Issue Labels
feature (34) upkeep (33) bug (26) documentation (23) tidy-dev-day :nerd_face: (16) discussion (2) reprex (2) question (2) help wanted :heart: (2)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 34,670 last-month
  • Total docker downloads: 33,521,067
  • Total dependent packages: 79
    (may contain duplicates)
  • Total dependent repositories: 185
    (may contain duplicates)
  • Total versions: 49
  • Total maintainers: 1
cran.r-project.org: parsnip

A Common API to Modeling and Analysis Functions

  • Versions: 29
  • Dependent Packages: 69
  • Dependent Repositories: 184
  • Downloads: 34,670 Last month
  • Docker Downloads: 33,521,067
Rankings
Stargazers count: 0.7%
Forks count: 1.0%
Dependent repos count: 1.4%
Dependent packages count: 1.5%
Average: 1.5%
Downloads: 2.2%
Docker downloads count: 2.3%
Maintainers (1)
Last synced: 10 months ago
conda-forge.org: r-parsnip
  • Versions: 20
  • Dependent Packages: 10
  • Dependent Repositories: 1
Rankings
Dependent packages count: 5.9%
Average: 17.4%
Stargazers count: 17.5%
Forks count: 21.7%
Dependent repos count: 24.3%
Last synced: 10 months ago