missRanger

Fast multivariate imputation by random forests.

https://github.com/mayer79/missranger

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary

Keywords

imputation machine-learning missing-values r random-forest rstats
Last synced: 6 months ago · JSON representation

Repository

Fast multivariate imputation by random forests.

Basic Info
Statistics
  • Stars: 70
  • Watchers: 10
  • Forks: 11
  • Open Issues: 2
  • Releases: 9
Topics
imputation machine-learning missing-values r random-forest rstats
Created over 9 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License

README.md

{missRanger}

R-CMD-check Codecov test coverage CRAN_Status_Badge

Overview

{missRanger} is a multivariate imputation algorithm based on random forests. It is a fast alternative to the famous 'MissForest' algorithm (Stekhoven and Buehlmann, 2012), and uses the {ranger} package (Wright and Ziegler, 2017) to fit the random forests. Since version 2.6.0, out-of-sample application is possible.

Installation

```r

From CRAN

install.packages("missRanger")

Development version

devtools::install_github("mayer79/missRanger") ```

Usage

```r library(missRanger)

set.seed(3)

irisNA <- generateNA(iris, p = 0.1) head(irisNA)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

5.1 3.5 1.4 0.2 setosa

4.9 3.0 1.4 NA setosa

4.7 3.2 1.3 0.2 setosa

4.6 3.1 1.5 0.2

NA 3.6 1.4 0.2 setosa

5.4 3.9 1.7 0.4

irisfilled <- missRanger(irisNA, pmm.k = 5, num.trees = 100) head(iris_filled)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

5 5.2 3.6 1.4 0.2 setosa

6 5.4 3.9 1.7 0.4 setosa

```

How it works

The algorithm iterates until the average out-of-bag (OOB) error of the forests stops improving. The missing values are filled by OOB predictions of the best iteration, optionally followed by predictive mean matching (PMM). The PMM step avoids values not present in the original data (like a value 0.3334 in a 0-1 coded variable). Furthermore, PMM raises the variance in the resulting conditional distributions to a more realistic level, a crucial property for multiple imputation.

Check-out the vignettes for more info, and for how to use missRanger() in multiple imputation.

References

  • Stekhoven D. J., Buehlmann, P. (2012). MissForest - non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118.
  • Marvin N. Wright, Andreas Ziegler (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1-17. doi:10.18637/jss.v077.i01

Owner

  • Name: Michael Mayer
  • Login: mayer79
  • Kind: user

Responsible statistics | ML

GitHub Events

Total
  • Create event: 4
  • Release event: 1
  • Issues event: 8
  • Watch event: 4
  • Delete event: 4
  • Issue comment event: 5
  • Push event: 10
  • Pull request event: 6
Last Year
  • Create event: 4
  • Release event: 1
  • Issues event: 8
  • Watch event: 4
  • Delete event: 4
  • Issue comment event: 5
  • Push event: 10
  • Pull request event: 6

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 213
  • Total Committers: 8
  • Avg Commits per committer: 26.625
  • Development Distribution Score (DDS): 0.221
Past Year
  • Commits: 74
  • Committers: 1
  • Avg Commits per committer: 74.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Michael Mayer m****9@g****m 166
Michael Mayer m****r@c****h 31
MMA M****A@I****L 8
olivroy 5****y 4
Thierry Gosselin t****n@i****m 1
Jamey McDowell j****l@g****m 1
Andrew Landgraf a****d 1
MMA M****A@n****2 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 42
  • Total pull requests: 44
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 16 hours
  • Total issue authors: 28
  • Total pull request authors: 6
  • Average comments per issue: 2.33
  • Average comments per pull request: 0.64
  • Merged pull requests: 41
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 18
  • Average time to close issues: 12 days
  • Average time to close pull requests: about 20 hours
  • Issue authors: 5
  • Pull request authors: 1
  • Average comments per issue: 1.75
  • Average comments per pull request: 0.28
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mayer79 (8)
  • thierrygosselin (5)
  • DarioS (3)
  • bgall (2)
  • jeandigitale (1)
  • lime-n (1)
  • ldliao (1)
  • AurelieMich (1)
  • LouChen-med (1)
  • stephematician (1)
  • joannawolthuis (1)
  • visokie (1)
  • NamLQ (1)
  • James-Yong-XIANG (1)
  • JameyPMcDowell (1)
Pull Request Authors
  • mayer79 (53)
  • pdwaggoner (3)
  • olivroy (2)
  • thierrygosselin (1)
  • JameyPMcDowell (1)
  • andland (1)
Top Labels
Issue Labels
enhancement (7) bug (4) documentation (2) wontfix (2) question (1)
Pull Request Labels
bug (2) enhancement (2)

Packages

  • Total packages: 1
  • Total downloads:
    • cran 2,724 last-month
  • Total docker downloads: 42,128
  • Total dependent packages: 7
  • Total dependent repositories: 13
  • Total versions: 16
  • Total maintainers: 1
cran.r-project.org: missRanger

Fast Imputation of Missing Values

  • Versions: 16
  • Dependent Packages: 7
  • Dependent Repositories: 13
  • Downloads: 2,724 Last month
  • Docker Downloads: 42,128
Rankings
Docker downloads count: 0.6%
Downloads: 5.4%
Average: 5.5%
Stargazers count: 6.3%
Forks count: 6.3%
Dependent packages count: 6.6%
Dependent repos count: 8.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • FNN * imports
  • ranger * imports
  • stats * imports
  • utils * imports
  • dplyr * suggests
  • knitr * suggests
  • mice * suggests
  • rmarkdown * suggests
  • survival * suggests
  • testthat >= 3.0.0 suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite