Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary
Keywords
model
predict
prediction
r
regression
tidy-data
Last synced: 6 months ago
·
JSON representation
Repository
Tidy, Type-Safe 'prediction()' Methods
Basic Info
- Host: GitHub
- Owner: leeper
- License: other
- Language: R
- Default Branch: main
- Homepage: https://cran.r-project.org/package=prediction
- Size: 491 KB
Statistics
- Stars: 88
- Watchers: 6
- Forks: 14
- Open Issues: 23
- Releases: 4
Topics
model
predict
prediction
r
regression
tidy-data
Created over 9 years ago
· Last pushed over 1 year ago
Metadata Files
Readme
Changelog
Contributing
License
README.Rmd
--- title: "Tidy, Type-Safe 'prediction()' Methods" output: github_document ---The **prediction** and **margins** packages are a combined effort to port the functionality of Stata's (closed source) [`margins`](http://www.stata.com/help.cgi?margins) command to (open source) R. **prediction** is focused on one function - `prediction()` - that provides type-safe methods for generating predictions from fitted regression models. `prediction()` is an S3 generic, which always return a `"data.frame"` class object rather than the mix of vectors, lists, etc. that are returned by the `predict()` methods for various model types. It provides a key piece of underlying infrastructure for the **margins** package. Users interested in generating marginal (partial) effects, like those generated by Stata's `margins, dydx(*)` command, should consider using `margins()` from the sibling project, [**margins**](https://cran.r-project.org/package=margins). In addition to `prediction()`, this package provides a number of utility functions for generating useful predictions: - `find_data()`, an S3 generic with methods that find the data frame used to estimate a regression model. This is a wrapper around `get_all_vars()` that attempts to locate data as well as modify it according to `subset` and `na.action` arguments used in the original modelling call. - `mean_or_mode()` and `median_or_mode()`, which provide a convenient way to compute the data needed for predicted values *at means* (or *at medians*), respecting the differences between factor and numeric variables. - `seq_range()`, which generates a vector of *n* values based upon the range of values in a variable - `build_datalist()`, which generates a list of data frames from an input data frame and a specified set of replacement `at` values (mimicking the `atlist` option of Stata's `margins` command) ## Simple code examples ```{r opts, echo = FALSE} library("knitr") options(width = 100) opts_knit$set(upload.fun = imgur_upload, base.url = NULL) opts_chunk$set(fig.width=7, fig.height=4) ``` A major downside of the `predict()` methods for common modelling classes is that the result is not type-safe. Consider the following simple example: ```{r predict} library("stats") library("datasets") x <- lm(mpg ~ cyl * hp + wt, data = mtcars) class(predict(x)) class(predict(x, se.fit = TRUE)) ``` **prediction** solves this issue by providing a wrapper around `predict()`, called `prediction()`, that always returns a tidy data frame with a very simple `print()` method: ```{r prediction} library("prediction") (p <- prediction(x)) class(p) head(p) ``` The output always contains the original data (i.e., either data found using the `find_data()` function or passed to the `data` argument to `prediction()`). This makes it much simpler to pass predictions to, e.g., further summary or plotting functions. Additionally the vast majority of methods allow the passing of an `at` argument, which can be used to obtain predicted values using modified version of `data` held to specific values: ```{r at_arg} prediction(x, at = list(hp = seq_range(mtcars$hp, 5))) ``` This more or less serves as a direct R port of (the subset of functionality of) Stata's `margins` command that calculates predictive marginal means, etc. For calculation of marginal or partial effects, see the [**margins**](https://cran.r-project.org/package=margins) package. ## Supported model classes The currently supported model classes are: - "lm" from `stats::lm()` - "glm" from `stats::glm()`, `MASS::glm.nb()`, `glmx::glmx()`, `glmx::hetglm()`, `brglm::brglm()` - "ar" from `stats::ar()` - "Arima" from `stats::arima()` - "arima0" from `stats::arima0()` - "biglm" from `biglm::biglm()` (including `"ffdf"` backed models) - "betareg" from `betareg::betareg()` - "bruto" from `mda::bruto()` - "clm" from `ordinal::clm()` - "coxph" from `survival::coxph()` - "crch" from `crch::crch()` - "earth" from `earth::earth()` - "fda" from `mda::fda()` - "Gam" from `gam::gam()` - "gausspr" from `kernlab::gausspr()` - "gee" from `gee::gee()` - "glimML" from `aod::betabin()`, `aod::negbin()` - "glimQL" from `aod::quasibin()`, `aod::quasipois()` - "glmnet" from `glmnet::glmnet()` - "gls" from `nlme::gls()` - "hurdle" from `pscl::hurdle()` - "hxlr" from `crch::hxlr()` - "ivreg" from `AER::ivreg()` - "knnreg" from `caret::knnreg()` - "kqr" from `kernlab::kqr()` - "ksvm" from `kernlab::ksvm()` - "lda" from `MASS:lda()` - "lme" from `nlme::lme()` - "loess" from `stats::loess()` - "lqs" from `MASS::lqs()` - "mars" from `mda::mars()` - "mca" from `MASS::mca()` - "mclogit" from `mclogit::mclogit()` - "mda" from `mda::mda()` - "merMod" from `lme4::lmer()` and `lme4::glmer()` - "mnp" from `MNP::mnp()` - "naiveBayes" from `e1071::naiveBayes()` - "nlme" from `nlme::nlme()` - "nls" from `stats::nls()` - "nnet" from `nnet::nnet()`, `nnet::multinom()` - "plm" from `plm::plm()` - "polr" from `MASS::polr()` - "ppr" from `stats::ppr()` - "princomp" from `stats::princomp()` - "qda" from `MASS:qda()` - "rlm" from `MASS::rlm()` - "rpart" from `rpart::rpart()` - "rq" from `quantreg::rq()` - "selection" from `sampleSelection::selection()` - "speedglm" from `speedglm::speedglm()` - "speedlm" from `speedglm::speedlm()` - "survreg" from `survival::survreg()` - "svm" from `e1071::svm()` - "svyglm" from `survey::svyglm()` - "tobit" from `AER::tobit()` - "train" from `caret::train()` - "truncreg" from `truncreg::truncreg()` - "zeroinfl" from `pscl::zeroinfl()` ## Requirements and Installation [](https://cran.r-project.org/package=prediction)  [](https://travis-ci.org/leeper/prediction) [](https://ci.appveyor.com/project/leeper/prediction/branch/master) [](https://codecov.io/github/leeper/prediction?branch=master) [](http://www.repostatus.org/#active) The development version of this package can be installed directly from GitHub using `remotes`: ``` r if (!require("remotes")) { install.packages("remotes") } remotes::install_github("leeper/prediction") ```
Owner
- Name: Thomas J. Leeper
- Login: leeper
- Kind: user
- Location: London, United Kingdom
- Website: http://www.thomasleeper.com
- Repositories: 153
- Profile: https://github.com/leeper
Behavioral scientist and R hacker
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Thomas J. Leeper | t****r@g****m | 190 |
| carlganz | c****z@g****m | 3 |
| Vincent Arel-Bundock | v****l@u****u | 1 |
Committer Domains (Top 20 + Academic)
umich.edu: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 44
- Total pull requests: 12
- Average time to close issues: 3 months
- Average time to close pull requests: 2 months
- Total issue authors: 25
- Total pull request authors: 7
- Average comments per issue: 1.18
- Average comments per pull request: 2.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- leeper (16)
- tzoltak (3)
- vincentarelbundock (2)
- benwhalley (2)
- alexpghayes (1)
- sam-crawley (1)
- ballardao (1)
- matguidi (1)
- mronkko (1)
- sgenter1 (1)
- puterleat (1)
- ghost (1)
- arcruz0 (1)
- bbolker (1)
- lucasfreitas1988 (1)
Pull Request Authors
- vincentarelbundock (3)
- carlganz (3)
- bbolker (2)
- benwhalley (2)
- danschrage (1)
- dfrankow (1)
- tzoltak (1)
Top Labels
Issue Labels
bug (13)
enhancement (11)
question (10)
help wanted (1)
Pull Request Labels
enhancement (2)
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 5
- Total dependent repositories: 2
- Total versions: 6
conda-forge.org: r-prediction
- Homepage: https://github.com/leeper/prediction
- License: MIT
-
Latest release: 0.3.14
published over 6 years ago
Rankings
Dependent packages count: 10.4%
Dependent repos count: 20.1%
Average: 27.2%
Stargazers count: 34.6%
Forks count: 43.6%
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.5.0 depends
- AER * enhances
- MASS * enhances
- MNP * enhances
- VGAM * enhances
- aod * enhances
- betareg * enhances
- biglm * enhances
- brglm * enhances
- caret * enhances
- crch * enhances
- e1071 * enhances
- earth * enhances
- ff * enhances
- ffbase * enhances
- gam >= 1.15 enhances
- gee * enhances
- glmnet * enhances
- glmx * enhances
- kernlab * enhances
- lme4 * enhances
- mclogit * enhances
- mda * enhances
- mlogit * enhances
- nlme * enhances
- nnet * enhances
- ordinal * enhances
- plm * enhances
- pscl * enhances
- quantreg * enhances
- rpart * enhances
- sampleSelection * enhances
- speedglm * enhances
- survey >= 3.31 enhances
- survival * enhances
- truncreg * enhances
- data.table * imports
- stats * imports
- utils * imports
- datasets * suggests
- methods * suggests
- testthat * suggests
The **prediction** and **margins** packages are a combined effort to port the functionality of Stata's (closed source) [`margins`](http://www.stata.com/help.cgi?margins) command to (open source) R. **prediction** is focused on one function - `prediction()` - that provides type-safe methods for generating predictions from fitted regression models. `prediction()` is an S3 generic, which always return a `"data.frame"` class object rather than the mix of vectors, lists, etc. that are returned by the `predict()` methods for various model types. It provides a key piece of underlying infrastructure for the **margins** package. Users interested in generating marginal (partial) effects, like those generated by Stata's `margins, dydx(*)` command, should consider using `margins()` from the sibling project, [**margins**](https://cran.r-project.org/package=margins).
In addition to `prediction()`, this package provides a number of utility functions for generating useful predictions:
- `find_data()`, an S3 generic with methods that find the data frame used to estimate a regression model. This is a wrapper around `get_all_vars()` that attempts to locate data as well as modify it according to `subset` and `na.action` arguments used in the original modelling call.
- `mean_or_mode()` and `median_or_mode()`, which provide a convenient way to compute the data needed for predicted values *at means* (or *at medians*), respecting the differences between factor and numeric variables.
- `seq_range()`, which generates a vector of *n* values based upon the range of values in a variable
- `build_datalist()`, which generates a list of data frames from an input data frame and a specified set of replacement `at` values (mimicking the `atlist` option of Stata's `margins` command)
## Simple code examples
```{r opts, echo = FALSE}
library("knitr")
options(width = 100)
opts_knit$set(upload.fun = imgur_upload, base.url = NULL)
opts_chunk$set(fig.width=7, fig.height=4)
```
A major downside of the `predict()` methods for common modelling classes is that the result is not type-safe. Consider the following simple example:
```{r predict}
library("stats")
library("datasets")
x <- lm(mpg ~ cyl * hp + wt, data = mtcars)
class(predict(x))
class(predict(x, se.fit = TRUE))
```
**prediction** solves this issue by providing a wrapper around `predict()`, called `prediction()`, that always returns a tidy data frame with a very simple `print()` method:
```{r prediction}
library("prediction")
(p <- prediction(x))
class(p)
head(p)
```
The output always contains the original data (i.e., either data found using the `find_data()` function or passed to the `data` argument to `prediction()`). This makes it much simpler to pass predictions to, e.g., further summary or plotting functions.
Additionally the vast majority of methods allow the passing of an `at` argument, which can be used to obtain predicted values using modified version of `data` held to specific values:
```{r at_arg}
prediction(x, at = list(hp = seq_range(mtcars$hp, 5)))
```
This more or less serves as a direct R port of (the subset of functionality of) Stata's `margins` command that calculates predictive marginal means, etc. For calculation of marginal or partial effects, see the [**margins**](https://cran.r-project.org/package=margins) package.
## Supported model classes
The currently supported model classes are:
- "lm" from `stats::lm()`
- "glm" from `stats::glm()`, `MASS::glm.nb()`, `glmx::glmx()`, `glmx::hetglm()`, `brglm::brglm()`
- "ar" from `stats::ar()`
- "Arima" from `stats::arima()`
- "arima0" from `stats::arima0()`
- "biglm" from `biglm::biglm()` (including `"ffdf"` backed models)
- "betareg" from `betareg::betareg()`
- "bruto" from `mda::bruto()`
- "clm" from `ordinal::clm()`
- "coxph" from `survival::coxph()`
- "crch" from `crch::crch()`
- "earth" from `earth::earth()`
- "fda" from `mda::fda()`
- "Gam" from `gam::gam()`
- "gausspr" from `kernlab::gausspr()`
- "gee" from `gee::gee()`
- "glimML" from `aod::betabin()`, `aod::negbin()`
- "glimQL" from `aod::quasibin()`, `aod::quasipois()`
- "glmnet" from `glmnet::glmnet()`
- "gls" from `nlme::gls()`
- "hurdle" from `pscl::hurdle()`
- "hxlr" from `crch::hxlr()`
- "ivreg" from `AER::ivreg()`
- "knnreg" from `caret::knnreg()`
- "kqr" from `kernlab::kqr()`
- "ksvm" from `kernlab::ksvm()`
- "lda" from `MASS:lda()`
- "lme" from `nlme::lme()`
- "loess" from `stats::loess()`
- "lqs" from `MASS::lqs()`
- "mars" from `mda::mars()`
- "mca" from `MASS::mca()`
- "mclogit" from `mclogit::mclogit()`
- "mda" from `mda::mda()`
- "merMod" from `lme4::lmer()` and `lme4::glmer()`
- "mnp" from `MNP::mnp()`
- "naiveBayes" from `e1071::naiveBayes()`
- "nlme" from `nlme::nlme()`
- "nls" from `stats::nls()`
- "nnet" from `nnet::nnet()`, `nnet::multinom()`
- "plm" from `plm::plm()`
- "polr" from `MASS::polr()`
- "ppr" from `stats::ppr()`
- "princomp" from `stats::princomp()`
- "qda" from `MASS:qda()`
- "rlm" from `MASS::rlm()`
- "rpart" from `rpart::rpart()`
- "rq" from `quantreg::rq()`
- "selection" from `sampleSelection::selection()`
- "speedglm" from `speedglm::speedglm()`
- "speedlm" from `speedglm::speedlm()`
- "survreg" from `survival::survreg()`
- "svm" from `e1071::svm()`
- "svyglm" from `survey::svyglm()`
- "tobit" from `AER::tobit()`
- "train" from `caret::train()`
- "truncreg" from `truncreg::truncreg()`
- "zeroinfl" from `pscl::zeroinfl()`
## Requirements and Installation
[](https://cran.r-project.org/package=prediction)

[](https://travis-ci.org/leeper/prediction)
[](https://ci.appveyor.com/project/leeper/prediction/branch/master)
[](https://codecov.io/github/leeper/prediction?branch=master)
[](http://www.repostatus.org/#active)
The development version of this package can be installed directly from GitHub using `remotes`:
``` r
if (!require("remotes")) {
install.packages("remotes")
}
remotes::install_github("leeper/prediction")
```