r-prediction

Tidy, Type-Safe 'prediction()' Methods

https://github.com/leeper/prediction

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.7%) to scientific vocabulary

Keywords

model predict prediction r regression tidy-data
Last synced: 6 months ago · JSON representation

Repository

Tidy, Type-Safe 'prediction()' Methods

Basic Info
Statistics
  • Stars: 88
  • Watchers: 6
  • Forks: 14
  • Open Issues: 23
  • Releases: 4
Topics
model predict prediction r regression tidy-data
Created over 9 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License

README.Rmd

---
title: "Tidy, Type-Safe 'prediction()' Methods"
output: github_document
---



The **prediction** and **margins** packages are a combined effort to port the functionality of Stata's (closed source) [`margins`](http://www.stata.com/help.cgi?margins) command to (open source) R. **prediction** is focused on one function - `prediction()` - that provides type-safe methods for generating predictions from fitted regression models. `prediction()` is an S3 generic, which always return a `"data.frame"` class object rather than the mix of vectors, lists, etc. that are returned by the `predict()` methods for various model types. It provides a key piece of underlying infrastructure for the **margins** package. Users interested in generating marginal (partial) effects, like those generated by Stata's `margins, dydx(*)` command, should consider using `margins()` from the sibling project, [**margins**](https://cran.r-project.org/package=margins).

In addition to `prediction()`, this package provides a number of utility functions for generating useful predictions:

 - `find_data()`, an S3 generic with methods that find the data frame used to estimate a regression model. This is a wrapper around `get_all_vars()` that attempts to locate data as well as modify it according to `subset` and `na.action` arguments used in the original modelling call.
 - `mean_or_mode()` and `median_or_mode()`, which provide a convenient way to compute the data needed for predicted values *at means* (or *at medians*), respecting the differences between factor and numeric variables.
 - `seq_range()`, which generates a vector of *n* values based upon the range of values in a variable
 - `build_datalist()`, which generates a list of data frames from an input data frame and a specified set of replacement `at` values (mimicking the `atlist` option of Stata's `margins` command)

## Simple code examples

```{r opts, echo = FALSE}
library("knitr")
options(width = 100)
opts_knit$set(upload.fun = imgur_upload, base.url = NULL)
opts_chunk$set(fig.width=7, fig.height=4)
```

A major downside of the `predict()` methods for common modelling classes is that the result is not type-safe. Consider the following simple example:

```{r predict}
library("stats")
library("datasets")
x <- lm(mpg ~ cyl * hp + wt, data = mtcars)
class(predict(x))
class(predict(x, se.fit = TRUE))
```

**prediction** solves this issue by providing a wrapper around `predict()`, called `prediction()`, that always returns a tidy data frame with a very simple `print()` method:

```{r prediction}
library("prediction")
(p <- prediction(x))
class(p)
head(p)
```

The output always contains the original data (i.e., either data found using the `find_data()` function or passed to the `data` argument to `prediction()`). This makes it much simpler to pass predictions to, e.g., further summary or plotting functions.

Additionally the vast majority of methods allow the passing of an `at` argument, which can be used to obtain predicted values using modified version of `data` held to specific values:

```{r at_arg}
prediction(x, at = list(hp = seq_range(mtcars$hp, 5)))
```

This more or less serves as a direct R port of (the subset of functionality of) Stata's `margins` command that calculates predictive marginal means, etc. For calculation of marginal or partial effects, see the [**margins**](https://cran.r-project.org/package=margins) package.

## Supported model classes

The currently supported model classes are:

 - "lm" from `stats::lm()`
 - "glm" from `stats::glm()`, `MASS::glm.nb()`, `glmx::glmx()`, `glmx::hetglm()`, `brglm::brglm()`
 - "ar" from `stats::ar()`
 - "Arima" from `stats::arima()`
 - "arima0" from `stats::arima0()`
 - "biglm" from `biglm::biglm()` (including `"ffdf"` backed models)
 - "betareg" from `betareg::betareg()`
 - "bruto" from `mda::bruto()`
 - "clm" from `ordinal::clm()`
 - "coxph" from `survival::coxph()`
 - "crch" from `crch::crch()`
 - "earth" from `earth::earth()`
 - "fda" from `mda::fda()`
 - "Gam" from `gam::gam()`
 - "gausspr" from `kernlab::gausspr()`
 - "gee" from `gee::gee()`
 - "glimML" from `aod::betabin()`, `aod::negbin()`
 - "glimQL" from `aod::quasibin()`, `aod::quasipois()`
 - "glmnet" from `glmnet::glmnet()`
 - "gls" from `nlme::gls()`
 - "hurdle" from `pscl::hurdle()`
 - "hxlr" from `crch::hxlr()`
 - "ivreg" from `AER::ivreg()`
 - "knnreg" from `caret::knnreg()`
 - "kqr" from `kernlab::kqr()`
 - "ksvm" from `kernlab::ksvm()`
 - "lda" from `MASS:lda()`
 - "lme" from `nlme::lme()`
 - "loess" from `stats::loess()`
 - "lqs" from `MASS::lqs()`
 - "mars" from `mda::mars()`
 - "mca" from `MASS::mca()`
 - "mclogit" from `mclogit::mclogit()`
 - "mda" from `mda::mda()`
 - "merMod" from `lme4::lmer()` and `lme4::glmer()`
 - "mnp" from `MNP::mnp()`
 - "naiveBayes" from `e1071::naiveBayes()`
 - "nlme" from `nlme::nlme()`
 - "nls" from `stats::nls()`
 - "nnet" from `nnet::nnet()`, `nnet::multinom()`
 - "plm" from `plm::plm()`
 - "polr" from `MASS::polr()`
 - "ppr" from `stats::ppr()`
 - "princomp" from `stats::princomp()`
 - "qda" from `MASS:qda()`
 - "rlm" from `MASS::rlm()`
 - "rpart" from `rpart::rpart()`
 - "rq" from `quantreg::rq()`
 - "selection" from `sampleSelection::selection()`
 - "speedglm" from `speedglm::speedglm()`
 - "speedlm" from `speedglm::speedlm()`
 - "survreg" from `survival::survreg()`
 - "svm" from `e1071::svm()`
 - "svyglm" from `survey::svyglm()`
 - "tobit" from `AER::tobit()`
 - "train" from `caret::train()`
 - "truncreg" from `truncreg::truncreg()`
 - "zeroinfl" from `pscl::zeroinfl()`

## Requirements and Installation

[![CRAN](https://www.r-pkg.org/badges/version/prediction)](https://cran.r-project.org/package=prediction)
![Downloads](https://cranlogs.r-pkg.org/badges/prediction)
[![Build Status](https://travis-ci.org/leeper/prediction.svg?branch=master)](https://travis-ci.org/leeper/prediction)
[![Build status](https://ci.appveyor.com/api/projects/status/a4tebeoa98cq07gy/branch/master?svg=true)](https://ci.appveyor.com/project/leeper/prediction/branch/master)
[![codecov.io](https://codecov.io/github/leeper/prediction/coverage.svg?branch=master)](https://codecov.io/github/leeper/prediction?branch=master)
[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)

The development version of this package can be installed directly from GitHub using `remotes`:

``` r
if (!require("remotes")) {
    install.packages("remotes")
}
remotes::install_github("leeper/prediction")
```

Owner

  • Name: Thomas J. Leeper
  • Login: leeper
  • Kind: user
  • Location: London, United Kingdom

Behavioral scientist and R hacker

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 194
  • Total Committers: 3
  • Avg Commits per committer: 64.667
  • Development Distribution Score (DDS): 0.021
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Thomas J. Leeper t****r@g****m 190
carlganz c****z@g****m 3
Vincent Arel-Bundock v****l@u****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 44
  • Total pull requests: 12
  • Average time to close issues: 3 months
  • Average time to close pull requests: 2 months
  • Total issue authors: 25
  • Total pull request authors: 7
  • Average comments per issue: 1.18
  • Average comments per pull request: 2.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • leeper (16)
  • tzoltak (3)
  • vincentarelbundock (2)
  • benwhalley (2)
  • alexpghayes (1)
  • sam-crawley (1)
  • ballardao (1)
  • matguidi (1)
  • mronkko (1)
  • sgenter1 (1)
  • puterleat (1)
  • ghost (1)
  • arcruz0 (1)
  • bbolker (1)
  • lucasfreitas1988 (1)
Pull Request Authors
  • vincentarelbundock (3)
  • carlganz (3)
  • bbolker (2)
  • benwhalley (2)
  • danschrage (1)
  • dfrankow (1)
  • tzoltak (1)
Top Labels
Issue Labels
bug (13) enhancement (11) question (10) help wanted (1)
Pull Request Labels
enhancement (2)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 5
  • Total dependent repositories: 2
  • Total versions: 6
conda-forge.org: r-prediction
  • Versions: 6
  • Dependent Packages: 5
  • Dependent Repositories: 2
Rankings
Dependent packages count: 10.4%
Dependent repos count: 20.1%
Average: 27.2%
Stargazers count: 34.6%
Forks count: 43.6%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • AER * enhances
  • MASS * enhances
  • MNP * enhances
  • VGAM * enhances
  • aod * enhances
  • betareg * enhances
  • biglm * enhances
  • brglm * enhances
  • caret * enhances
  • crch * enhances
  • e1071 * enhances
  • earth * enhances
  • ff * enhances
  • ffbase * enhances
  • gam >= 1.15 enhances
  • gee * enhances
  • glmnet * enhances
  • glmx * enhances
  • kernlab * enhances
  • lme4 * enhances
  • mclogit * enhances
  • mda * enhances
  • mlogit * enhances
  • nlme * enhances
  • nnet * enhances
  • ordinal * enhances
  • plm * enhances
  • pscl * enhances
  • quantreg * enhances
  • rpart * enhances
  • sampleSelection * enhances
  • speedglm * enhances
  • survey >= 3.31 enhances
  • survival * enhances
  • truncreg * enhances
  • data.table * imports
  • stats * imports
  • utils * imports
  • datasets * suggests
  • methods * suggests
  • testthat * suggests