ranger

A Fast Implementation of Random Forests

https://github.com/imbs-hl/ranger

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 20 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    6 of 35 committers (17.1%) from academic institutions
  • Institutional organization owner
    Organization imbs-hl has institutional domain (www.imbs.uni-luebeck.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.6%) to scientific vocabulary

Keywords from Contributors

econometrics causal-inference causal-forest stacking interactive tidy-data predictive-modeling parsing ecosystem-modeling latex
Last synced: 6 months ago · JSON representation

Repository

A Fast Implementation of Random Forests

Basic Info
Statistics
  • Stars: 792
  • Watchers: 40
  • Forks: 197
  • Open Issues: 92
  • Releases: 7
Created over 10 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog Funding

README.md

R-CMD-check CRAN Downloads month CRAN Downloads overall

ranger: A Fast Implementation of Random Forests

Marvin N. Wright

Introduction

ranger is a fast implementation of random forests (Breiman 2001) or recursive partitioning, particularly suited for high dimensional data. Classification, regression, and survival forests are supported. Classification and regression forests are implemented as in the original Random Forest (Breiman 2001), survival forests as in Random Survival Forests (Ishwaran et al. 2008). Includes implementations of extremely randomized trees (Geurts et al. 2006) and quantile regression forests (Meinshausen 2006).

ranger is written in C++, but a version for R is available, too. We recommend to use the R version. It is easy to install and use and the results are readily available for further analysis. The R version is as fast as the standalone C++ version.

Installation

R version

To install the ranger R package from CRAN, just run

R install.packages("ranger")

R version >= 3.1 is required. With recent R versions, multithreading on Windows platforms should just work. If you compile yourself, the new RTools toolchain is required.

To install the development version from GitHub using devtools, run

R devtools::install_github("imbs-hl/ranger")

Standalone C++ version

To install the C++ version of ranger in Linux or Mac OS X you will need a compiler supporting C++14 (i.e. gcc >= 5 or Clang >= 3.4) and Cmake. To build start a terminal from the ranger main directory and run the following commands

bash cd cpp_version mkdir build cd build cmake .. make

After compilation there should be an executable called "ranger" in the build directory.

To run the C++ version in Microsoft Windows please cross compile or ask for a binary.

Usage

R version

For usage of the R version see ?ranger in R. Most importantly, see the Examples section. As a first example you could try

R ranger(Species ~ ., data = iris)

Standalone C++ version

In the C++ version type

bash ./ranger --help

for a list of commands. First you need a training dataset in a file. This file should contain one header line with variable names and one line with variable values per sample (numeric only). Variable names must not contain any whitespace, comma or semicolon. Values can be separated by whitespace, comma or semicolon but can not be mixed in one file. A typical call of ranger would be for example

bash ./ranger --verbose --file data.dat --depvarname Species --treetype 1 --ntree 1000 --nthreads 4

If you find any bugs, or if you experience any crashes, please report to us. If you have any questions just ask, we won't bite.

Please cite our paper if you use ranger.

References

  • Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17. https://doi.org/10.18637/jss.v077.i01.
  • Schmid, M., Wright, M. N. & Ziegler, A. (2016). On the use of Harrell's C for clinical risk prediction via random survival forests. Expert Syst Appl 63:450-459. https://doi.org/10.1016/j.eswa.2016.07.018.
  • Wright, M. N., Dankowski, T. & Ziegler, A. (2017). Unbiased split variable selection for random survival forests using maximally selected rank statistics. Stat Med 36:1272-1284. https://doi.org/10.1002/sim.7212.
  • Nembrini, S., König, I. R. & Wright, M. N. (2018). The revival of the Gini Importance? Bioinformatics. https://doi.org/10.1093/bioinformatics/bty373.
  • Breiman, L. (2001). Random forests. Mach Learn, 45:5-32. https://doi.org/10.1023/A:1010933404324.
  • Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. Ann Appl Stat 2:841-860. https://doi.org/10.1097/JTO.0b013e318233d835.
  • Malley, J. D., Kruppa, J., Dasgupta, A., Malley, K. G., & Ziegler, A. (2012). Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inf Med 51:74-81. https://doi.org/10.3414/ME00-01-0052.
  • Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning. Springer, New York. 2nd edition.
  • Geurts, P., Ernst, D., Wehenkel, L. (2006). Extremely randomized trees. Mach Learn 63:3-42. https://doi.org/10.1007/s10994-006-6226-1.
  • Meinshausen (2006). Quantile Regression Forests. J Mach Learn Res 7:983-999. http://www.jmlr.org/papers/v7/meinshausen06a.html.
  • Sandri, M. & Zuccolotto, P. (2008). A bias correction algorithm for the Gini variable importance measure in classification trees. J Comput Graph Stat, 17:611-628. https://doi.org/10.1198/106186008X344522.
  • Coppersmith D., Hong S. J., Hosking J. R. (1999). Partitioning nominal attributes in decision trees. Data Min Knowl Discov 3:197-217. https://doi.org/10.1023/A:1009869804967.

Owner

  • Name: IMBS
  • Login: imbs-hl
  • Kind: organization
  • Email: info@imbs.uni-luebeck.de
  • Location: Lübeck, Germany

Universität zu Lübeck

GitHub Events

Total
  • Issues event: 17
  • Watch event: 20
  • Delete event: 12
  • Issue comment event: 31
  • Push event: 22
  • Pull request event: 20
  • Fork event: 9
  • Create event: 8
Last Year
  • Issues event: 17
  • Watch event: 20
  • Delete event: 12
  • Issue comment event: 31
  • Push event: 22
  • Pull request event: 20
  • Fork event: 9
  • Create event: 8

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 1,178
  • Total Committers: 35
  • Avg Commits per committer: 33.657
  • Development Distribution Score (DDS): 0.4
Past Year
  • Commits: 54
  • Committers: 6
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.13
Top Committers
Name Email Commits
Marvin Wright w****t@i****e 707
Marvin Wright g****b@w****e 364
Daniel Cooke d****e@w****k 31
Bruna b****w@g****m 8
Robrecht Cannoodt r****d@g****m 8
Lukas Burk l****s@q****e 7
animusnaturae b****s@s****e 7
Christian Lorentzen l****h@g****m 5
Kirill Müller k****r@m****g 4
dependabot[bot] 4****] 3
Marvin N. Wright w****k@w****e 3
Ben Gorman b****9@g****m 2
Damian Gola g****a@i****e 2
Gregor de Cillia d****r@g****m 2
Michael Chirico c****m@g****m 2
Stanley E. Lazic s****c@c****t 2
Brandon Greenwell g****n@g****m 2
talegari s****h@g****m 2
Bernie Gray b****3@g****m 1
olivroy 5****y 1
SvenVw 3****w 1
Oliver Keyes I****s 1
Laurențiu Nicola l****a@d****o 1
Katrin Leinweber k****i@p****e 1
Drago Plecko d****o@s****h 1
Erik Doffagne e****e@g****m 1
Julie Tibshirani j****s@g****m 1
Kirill Sevastyanenko k****a@g****m 1
Kylen Solvik k****k@g****m 1
Marras Antoine a****s@s****r 1
and 5 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 158
  • Total pull requests: 105
  • Average time to close issues: over 1 year
  • Average time to close pull requests: 6 months
  • Total issue authors: 118
  • Total pull request authors: 22
  • Average comments per issue: 3.28
  • Average comments per pull request: 1.27
  • Merged pull requests: 77
  • Bot issues: 0
  • Bot pull requests: 16
Past Year
  • Issues: 15
  • Pull requests: 23
  • Average time to close issues: 18 days
  • Average time to close pull requests: 17 days
  • Issue authors: 14
  • Pull request authors: 5
  • Average comments per issue: 0.8
  • Average comments per pull request: 0.39
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 8
Top Authors
Issue Authors
  • DarioS (8)
  • nikosGeography (6)
  • A-Pai (5)
  • mnwright (4)
  • PhilippPro (4)
  • ghost (4)
  • brfitzpatrick (3)
  • UnixJunkie (2)
  • tomauer (2)
  • rtlprmft (2)
  • talegari (2)
  • YingxiaLiu (2)
  • katavuk (2)
  • kadyb (2)
  • Datou0718 (2)
Pull Request Authors
  • mnwright (57)
  • dependabot[bot] (16)
  • jschueller (5)
  • MichaelChirico (2)
  • barracuda156 (2)
  • 0x7f (2)
  • lorentzenchr (2)
  • novaktim (2)
  • jemus42 (2)
  • hyanworkspace (2)
  • sligocki (2)
  • tagteam (1)
  • stanlazic (1)
  • spineki (1)
  • bigerl (1)
Top Labels
Issue Labels
enhancement (14) long-term (11) documentation (6) contributions welcome (5) C++ version (4) Survival (2) runtime (2) bug (1)
Pull Request Labels
dependencies (16) Next release (8) not to be merged (for now) (5) github_actions (1)

Packages

  • Total packages: 3
  • Total downloads:
    • cran 72,107 last-month
  • Total docker downloads: 284,463
  • Total dependent packages: 200
    (may contain duplicates)
  • Total dependent repositories: 504
    (may contain duplicates)
  • Total versions: 32
  • Total maintainers: 1
cran.r-project.org: ranger

A Fast Implementation of Random Forests

  • Versions: 21
  • Dependent Packages: 191
  • Dependent Repositories: 499
  • Downloads: 72,107 Last month
  • Docker Downloads: 284,463
Rankings
Forks count: 0.3%
Stargazers count: 0.4%
Dependent packages count: 0.6%
Dependent repos count: 0.7%
Downloads: 1.7%
Average: 4.1%
Docker downloads count: 20.7%
Maintainers (1)
Last synced: 7 months ago
proxy.golang.org: github.com/imbs-hl/ranger
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 9.0%
Average: 9.6%
Dependent repos count: 10.2%
Last synced: 6 months ago
conda-forge.org: r-ranger
  • Versions: 8
  • Dependent Packages: 9
  • Dependent Repositories: 5
Rankings
Dependent packages count: 6.5%
Average: 12.4%
Forks count: 13.3%
Dependent repos count: 14.8%
Stargazers count: 14.9%
Last synced: 6 months ago