Confidence Intervals for Random Forests in Python
Confidence Intervals for Random Forests in Python - Published in JOSS (2017)
https://github.com/scikit-learn-contrib/forest-confidence-interval
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
3 of 18 committers (16.7%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Repository
Confidence intervals for scikit-learn forest algorithms
Basic Info
- Host: GitHub
- Owner: scikit-learn-contrib
- License: mit
- Language: HTML
- Default Branch: master
- Homepage: http://contrib.scikit-learn.org/forest-confidence-interval/
- Size: 11.5 MB
Statistics
- Stars: 289
- Watchers: 16
- Forks: 49
- Open Issues: 5
- Releases: 3
Metadata Files
README.md
forestci: confidence intervals for Forest algorithms
Forest algorithms are powerful ensemble methods for classification and regression. However, predictions from these algorithms do contain some amount of error. Prediction variability can illustrate how influential the training set is for producing the observed random forest predictions.
forest-confidence-interval is a Python module that adds a calculation of
variance and computes confidence intervals to the basic functionality
implemented in scikit-learn random forest regression or classification objects.
The core functions calculate an in-bag and error bars for random forest
objects.
This module is based on R code from Stefan Wager
(randomForestCI deprecated in favor of grf)
and is licensed under the MIT open source license (see LICENSE).
The present project makes the algorithm compatible with scikit-learn.
To get the proper confidence interval, you need to use a large number of trees (estimators).
The calibration routine
(which can be included or excluded on top of the algorithm) tries to extrapolate
the results for an infinite number of trees, but it is instable and it can cause numerical errors:
if this is the case, the suggestion is to exclude it with calibrate=False
and test increasing the number of trees in the model to reach convergence.
Installation and Usage
Before installing the module you will need numpy, scipy and scikit-learn.
To install forest-confidence-interval execute:
pip install forestci
If would like to install the development version of the software use:
shell
pip install git+git://github.com/scikit-learn-contrib/forest-confidence-interval.git
Usage:
python
import forestci as fci
ci = fci.random_forest_error(
forest=model, # scikit-learn Forest model fitted on X_train
X_train_shape=X_train.shape,
X_test=X, # the samples you want to compute the CI
inbag=None,
calibrate=True,
memory_constrained=False,
memory_limit=None,
y_output=0 # in case of multioutput model, consider target 0
)
Examples
The examples (gallery below) demonstrates the package functionality with random forest classifiers and regression models. The regression example uses a popular UCI Machine Learning data set on cars while the classifier example simulates how to add measurements of uncertainty to tasks like predicting spam emails.
Contributing
Contributions are very welcome, but we ask that contributors abide by the contributor covenant.
To report issues with the software, please post to the issue log Bug reports are also appreciated, please add them to the issue log after verifying that the issue does not already exist. Comments on existing issues are also welcome.
Please submit improvements as pull requests against the repo after verifying that the existing tests pass and any new code is well covered by unit tests. Please write code that complies with the Python style guide, PEP8.
E-mail Ariel Rokem, Kivan Polimis, or Bryna Hazelton if you have any questions, suggestions or feedback.
Testing
Requires installation of pytest package.
Tests are located in the forestci/tests folder and can be run with this command in the root directory:
shell
pytest forestci --doctest-modules
Citation
Click on the JOSS status badge for the Journal of Open Source Software article on this project. The BibTeX citation for the JOSS article is below:
@article{polimisconfidence,
title={Confidence Intervals for Random Forests in Python},
author={Polimis, Kivan and Rokem, Ariel and Hazelton, Bryna},
journal={Journal of Open Source Software},
volume={2},
number={1},
year={2017}
}
Owner
- Name: scikit-learn-contrib
- Login: scikit-learn-contrib
- Kind: organization
- Website: http://contrib.scikit-learn.org
- Repositories: 27
- Profile: https://github.com/scikit-learn-contrib
scikit-learn compatible projects
JOSS Publication
Confidence Intervals for Random Forests in Python
Authors
Tags
scikit-learn random forest confidence intervalsGitHub Events
Total
- Watch event: 7
- Delete event: 1
- Member event: 1
- Issue comment event: 1
- Push event: 4
- Pull request event: 5
- Fork event: 2
- Create event: 1
Last Year
- Watch event: 7
- Delete event: 1
- Member event: 1
- Issue comment event: 1
- Push event: 4
- Pull request event: 5
- Fork event: 2
- Create event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| arokem | a****m@g****m | 116 |
| kpolimis | k****s@g****m | 101 |
| Daniele Ongari | d****i@g****m | 12 |
| Ab2nour | 6****r | 7 |
| Vighnesh Birodkar | v****r@n****u | 5 |
| Adam Richie-Halford | r****d@g****m | 4 |
| adamwlev | a****4@m****m | 4 |
| Dominik Waurenschk | d****k@p****e | 3 |
| Cedric Wagner | c****r@r****e | 3 |
| Ludvig Hult | l****t@i****e | 3 |
| MartinUrban | M****o@g****m | 2 |
| Arfon Smith | a****n | 1 |
| Boyuan Deng | b****g@g****m | 1 |
| Eric Ma | e****g@g****m | 1 |
| Max Ghenis | m****s@g****m | 1 |
| owlas | o****t@s****k | 1 |
| Lei Ma | l****a@s****m | 1 |
| traims | p****s@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 51
- Total pull requests: 54
- Average time to close issues: almost 2 years
- Average time to close pull requests: about 1 month
- Total issue authors: 37
- Total pull request authors: 17
- Average comments per issue: 2.76
- Average comments per pull request: 1.2
- Merged pull requests: 47
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 14 hours
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.33
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- DannyArends (10)
- ericmjl (4)
- tawe141 (2)
- chahakmehta (2)
- stasSajin (1)
- JIAZHEN (1)
- sq5rix (1)
- BSharmi (1)
- miranov25 (1)
- joachimder (1)
- AlCorreia (1)
- finbarrtimbers (1)
- CandyOates (1)
- richford (1)
- csanadpoda (1)
Pull Request Authors
- arokem (20)
- kpolimis (13)
- Ab2nour (6)
- danieleongari (4)
- owlas (3)
- el-hult (2)
- arfon (1)
- richford (1)
- emptymalei (1)
- DasCapschen (1)
- ericmjl (1)
- olp-cs (1)
- MaxGhenis (1)
- adamwlev (1)
- hzhao16 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 35,665 last-month
- Total docker downloads: 48
-
Total dependent packages: 4
(may contain duplicates) -
Total dependent repositories: 40
(may contain duplicates) - Total versions: 12
- Total maintainers: 3
pypi.org: forestci
forestci: confidence intervals for scikit-learn forest algorithms
- Homepage: http://github.com/scikit-learn-contrib/forest-confidence-interval
- Documentation: https://forestci.readthedocs.io/
- License: MIT
-
Latest release: 0.5.1
published over 4 years ago
Rankings
Maintainers (3)
conda-forge.org: forestci
a Python module for calculating variance and adding confidence intervals to scikit-learn random forest regression or classification objects. The core functions calculate an in-bag and error bars for random forest objects
- Homepage: https://github.com/scikit-learn-contrib/forest-confidence-interval
- License: MIT
-
Latest release: 0.3
published about 3 years ago
Rankings
Dependencies
- JamesIves/github-pages-deploy-action releases/v3 composite
- actions/checkout v1 composite
- actions/setup-python v1 composite
- actions/upload-artifact v1 composite
- actions/checkout v1 composite
- actions/setup-python v1 composite
- flake8 * development
- matplotlib * development
- numpydoc * development
- pandas * development
- pillow * development
- pytest ==5.2.2 development
- pytest-cov ==2.8.1 development
- sphinx * development
- sphinx-autoapi * development
- sphinx_gallery * development
- sphinx_rtd_theme * development
- numpy >=1.20
- scikit-learn >=0.23.1
