Confidence Intervals for Random Forests in Python

Confidence Intervals for Random Forests in Python - Published in JOSS (2017)

https://github.com/scikit-learn-contrib/forest-confidence-interval

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
✓
Committers with academic emails
3 of 18 committers (16.7%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 36% confidence

Last synced: 6 months ago · JSON representation

Repository

Confidence intervals for scikit-learn forest algorithms

Basic Info

Host: GitHub
Owner: scikit-learn-contrib
License: mit
Language: HTML
Default Branch: master
Homepage: http://contrib.scikit-learn.org/forest-confidence-interval/
Size: 11.5 MB

Statistics

Stars: 289
Watchers: 16
Forks: 49
Open Issues: 5
Releases: 3

Created almost 10 years ago · Last pushed 10 months ago

Metadata Files

Readme Contributing License

`forestci`: confidence intervals for Forest algorithms

Forest algorithms are powerful ensemble methods for classification and regression. However, predictions from these algorithms do contain some amount of error. Prediction variability can illustrate how influential the training set is for producing the observed random forest predictions.

forest-confidence-interval is a Python module that adds a calculation of variance and computes confidence intervals to the basic functionality implemented in scikit-learn random forest regression or classification objects. The core functions calculate an in-bag and error bars for random forest objects.

This module is based on R code from Stefan Wager (randomForestCI deprecated in favor of grf) and is licensed under the MIT open source license (see LICENSE). The present project makes the algorithm compatible with scikit-learn.

To get the proper confidence interval, you need to use a large number of trees (estimators). The calibration routine (which can be included or excluded on top of the algorithm) tries to extrapolate the results for an infinite number of trees, but it is instable and it can cause numerical errors: if this is the case, the suggestion is to exclude it with calibrate=False and test increasing the number of trees in the model to reach convergence.

Installation and Usage

Before installing the module you will need numpy, scipy and scikit-learn.

To install forest-confidence-interval execute: pip install forestci If would like to install the development version of the software use:

shell pip install git+git://github.com/scikit-learn-contrib/forest-confidence-interval.git

Usage:

python import forestci as fci ci = fci.random_forest_error( forest=model, # scikit-learn Forest model fitted on X_train X_train_shape=X_train.shape, X_test=X, # the samples you want to compute the CI inbag=None, calibrate=True, memory_constrained=False, memory_limit=None, y_output=0 # in case of multioutput model, consider target 0 )

Examples

The examples (gallery below) demonstrates the package functionality with random forest classifiers and regression models. The regression example uses a popular UCI Machine Learning data set on cars while the classifier example simulates how to add measurements of uncertainty to tasks like predicting spam emails.

Examples gallery

Contributing

Contributions are very welcome, but we ask that contributors abide by the contributor covenant.

To report issues with the software, please post to the issue log Bug reports are also appreciated, please add them to the issue log after verifying that the issue does not already exist. Comments on existing issues are also welcome.

Please submit improvements as pull requests against the repo after verifying that the existing tests pass and any new code is well covered by unit tests. Please write code that complies with the Python style guide, PEP8.

E-mail Ariel Rokem, Kivan Polimis, or Bryna Hazelton if you have any questions, suggestions or feedback.

Testing

Requires installation of pytest package.

Tests are located in the forestci/tests folder and can be run with this command in the root directory:

shell pytest forestci --doctest-modules

Citation

Click on the JOSS status badge for the Journal of Open Source Software article on this project. The BibTeX citation for the JOSS article is below:

@article{polimisconfidence, title={Confidence Intervals for Random Forests in Python}, author={Polimis, Kivan and Rokem, Ariel and Hazelton, Bryna}, journal={Journal of Open Source Software}, volume={2}, number={1}, year={2017} }

Owner

Name: scikit-learn-contrib
Login: scikit-learn-contrib
Kind: organization

Website: http://contrib.scikit-learn.org
Repositories: 27
Profile: https://github.com/scikit-learn-contrib

scikit-learn compatible projects

JOSS Publication

Confidence Intervals for Random Forests in Python

Published

November 09, 2017

DOI

10.21105/joss.00124

Volume 2, Issue 19, Page 124

Authors

Kivan Polimis

eScience Institute, University of Washington

Ariel Rokem

eScience Institute, University of Washington

Bryna Hazelton

eScience Institute, University of Washington

Editor

Jake Vanderplas

GitHub Events

Total

Watch event: 7
Delete event: 1
Member event: 1
Issue comment event: 1
Push event: 4
Pull request event: 5
Fork event: 2
Create event: 1

Last Year

Watch event: 7
Delete event: 1
Member event: 1
Issue comment event: 1
Push event: 4
Pull request event: 5
Fork event: 2
Create event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 267
Total Committers: 18
Avg Commits per committer: 14.833
Development Distribution Score (DDS): 0.566

Past Year

Commits: 7
Committers: 1
Avg Commits per committer: 7.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
arokem	a**m@g**m	116
kpolimis	k**s@g**m	101
Daniele Ongari	d**i@g**m	12
Ab2nour	6****r	7
Vighnesh Birodkar	v**r@n**u	5
Adam Richie-Halford	r**d@g**m	4
adamwlev	a**4@m**m	4
Dominik Waurenschk	d**k@p**e	3
Cedric Wagner	c**r@r**e	3
Ludvig Hult	l**t@i**e	3
MartinUrban	M**o@g**m	2
Arfon Smith	a****n	1
Boyuan Deng	b**g@g**m	1
Eric Ma	e**g@g**m	1
Max Ghenis	m**s@g**m	1
owlas	o**t@s**k	1
Lei Ma	l**a@s**m	1
traims	p**s@g**m	1

Committer Domains (Top 20 + Academic)

saloodo.com: 1 soton.ac.uk: 1 it.uu.se: 1 rwth-aachen.de: 1 peerox.de: 1 me.com: 1 nyu.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 51
Total pull requests: 54
Average time to close issues: almost 2 years
Average time to close pull requests: about 1 month
Total issue authors: 37
Total pull request authors: 17
Average comments per issue: 2.76
Average comments per pull request: 1.2
Merged pull requests: 47
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: about 14 hours
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.33
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

DannyArends (10)
ericmjl (4)
tawe141 (2)
chahakmehta (2)
stasSajin (1)
JIAZHEN (1)
sq5rix (1)
BSharmi (1)
miranov25 (1)
joachimder (1)
AlCorreia (1)
finbarrtimbers (1)
CandyOates (1)
richford (1)
csanadpoda (1)

Pull Request Authors

arokem (20)
kpolimis (13)
Ab2nour (6)
danieleongari (4)
owlas (3)
el-hult (2)
arfon (1)
richford (1)
emptymalei (1)
DasCapschen (1)
ericmjl (1)
olp-cs (1)
MaxGhenis (1)
adamwlev (1)
hzhao16 (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- pypi 35,665 last-month
Total docker downloads: 48

Total dependent packages: 4
(may contain duplicates)
Total dependent repositories: 40
(may contain duplicates)
Total versions: 12
Total maintainers: 3

pypi.org: forestci

forestci: confidence intervals for scikit-learn forest algorithms

Homepage: http://github.com/scikit-learn-contrib/forest-confidence-interval
Documentation: https://forestci.readthedocs.io/
License: MIT
Latest release: 0.5.1
published over 4 years ago

Versions: 10
Dependent Packages: 4
Dependent Repositories: 39
Downloads: 35,665 Last month
Docker Downloads: 48

Rankings

Downloads: 1.8%

Dependent packages count: 1.9%

Dependent repos count: 2.3%

Average: 3.2%

Docker downloads count: 3.5%

Stargazers count: 3.8%

Forks count: 6.1%

Maintainers (3)

arokem danieleongari kpolimis

Last synced: 6 months ago

conda-forge.org: forestci

a Python module for calculating variance and adding confidence intervals to scikit-learn random forest regression or classification objects. The core functions calculate an in-bag and error bars for random forest objects

Homepage: https://github.com/scikit-learn-contrib/forest-confidence-interval
License: MIT
Latest release: 0.3
published over 3 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 1

Rankings

Stargazers count: 23.0%

Dependent repos count: 24.4%

Forks count: 26.6%

Average: 31.4%

Dependent packages count: 51.6%

Last synced: 6 months ago

Dependencies

.github/workflows/docbuild.yml actions

JamesIves/github-pages-deploy-action releases/v3 composite
actions/checkout v1 composite
actions/setup-python v1 composite
actions/upload-artifact v1 composite

.github/workflows/pythonpackage.yml actions

actions/checkout v1 composite
actions/setup-python v1 composite

requirements-dev.txt pypi

flake8 * development
matplotlib * development
numpydoc * development
pandas * development
pillow * development
pytest ==5.2.2 development
pytest-cov ==2.8.1 development
sphinx * development
sphinx-autoapi * development
sphinx_gallery * development
sphinx_rtd_theme * development

requirements.txt pypi

numpy >=1.20
scikit-learn >=0.23.1

setup.py pypi

Confidence Intervals for Random Forests in Python

Science Score: 95.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.md

forestci: confidence intervals for Forest algorithms

Installation and Usage

Examples

Contributing

Testing

Citation

Owner

JOSS Publication

Confidence Intervals for Random Forests in Python

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: forestci

Rankings

Maintainers (3)

conda-forge.org: forestci

Rankings

Dependencies

`forestci`: confidence intervals for Forest algorithms