treeple

Scikit-learn compatible decision trees beyond those offered in scikit-learn

https://github.com/neurodata/treeple

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary

Keywords

causal-inference causal-machine-learning cython decision-trees estimation machine-learning python random-forest scikit-learn

Keywords from Contributors

closember mesh energy-system-model standards names energy-system exoplanet name-generation dynamics connectivity
Last synced: 6 months ago · JSON representation ·

Repository

Scikit-learn compatible decision trees beyond those offered in scikit-learn

Basic Info
  • Host: GitHub
  • Owner: neurodata
  • License: other
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage: https://treeple.ai
  • Size: 183 MB
Statistics
  • Stars: 84
  • Watchers: 4
  • Forks: 21
  • Open Issues: 79
  • Releases: 15
Topics
causal-inference causal-machine-learning cython decision-trees estimation machine-learning python random-forest scikit-learn
Created almost 4 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing Funding License Citation

README.md

Code style: black CircleCI Main Checked with mypy codecov PyPI Download count Latest PyPI release DOI

treeple

treeple is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

Note that this package was originally named scikit-tree but was renamed to treeple after version 0.8.0. version <0.8.0 is still available at .

Documentation

See here for the documentation for our dev version: https://docs.neurodata.io/treeple/dev/index.html

Is treeple useful for me?

  1. If you use decision tree models (random forest, extra trees, isolation forests, etc.) in your work, treeple is a good package to try out. We have a variety of better tree models that are not available in scikit-learn, and we are always looking for new tree models to implement. For example, oblique decision trees are in general better than their axis-aligned counterparts.

  2. If you are interested in extending the decision tree API in scikit-learn, treeple is a good package to try out. We have a variety of internal APIs that are not available in scikit-learn, and are able to support new decision tree models easier.

Why oblique trees and why trees beyond those in scikit-learn?

In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI, which is the axis-aligned traditional random forest. One was known as Forest-RC, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), Extended Isolation Forests, Quantile Forests, or unsupervised random forests are also important at solving real-world problems using robust decision tree models.

Installation

Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.

Dependencies

We minimally require:

* Python (>=3.9)
* numpy
* scipy
* scikit-learn

Installation with Pip (https://pypi.org/project/treeple/)

Installing with pip on a conda environment is the recommended route.

pip install treeple

Development

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. We also will welcome contributions in C/C++ if they improve the extensibility, or runtime performance of the codebase. Our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a fork of scikit-learn at https://github.com/neurodata/scikit-learn when extending the decision tree model API of scikit-learn. Specifically, we extend the Python and Cython API of the tree submodule in scikit-learn in our submodule, so we can introduce the tree models housed in this package. Thus these extend the functionality of decision-tree based models in a way that is not possible yet in scikit-learn itself. As one example, we introduce an abstract API to allow users to implement their own oblique splits. Our plan in the future is to benchmark these functionalities and introduce them upstream to scikit-learn where applicable and inclusion criterion are met.

References

[1]: Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks" SIAM Journal on Mathematics of Data Science, 5(1), 77-96, 2023

Owner

  • Name: neurodata
  • Login: neurodata
  • Kind: organization
  • Email: admin@neurodata.io
  • Location: everywhere

Citation (CITATION.cff)

# YAML 1.2
---
# Metadata for citation of this software according to the CFF format (https://citation-file-format.github.io/)
cff-version: 1.2.0
title: "treeple: Modern decision-trees compatible with scikit-learn in Python."
abstract: "treeple is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees."
authors:
  - given-names: Adam
    family-names: Li
    affiliation: "Department of Computer Science, Columbia University, New York, NY, USA"
    orcid: "https://orcid.org/0000-0001-8421-365X"
  - given-names: Sambit
    family-names: Panda
    affiliation: "Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA"
    orcid: "https://orcid.org/0000-0001-8455-4243"
  - given-names: Haoyin
    family-names: Xu
    affiliation: "Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA"
    orcid: "https://orcid.org/0000-0001-8235-4950"
  - given-names: Itsuki
    family-names: Ogihara
    affiliation: "Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA"
type: software
repository-code: "https://github.com/neurodata/treeple"
license: 'PolyForm-Noncommercial-1.0.0'
keywords:
  - random forest
  - oblique trees
  - honest forests
  - statisical learning
  - machine learning
message: >-
  Please cite this software using the metadata from
  'preferred-citation' in the CITATION.cff file.

GitHub Events

Total
  • Fork event: 8
  • Create event: 23
  • Commit comment event: 2
  • Issues event: 8
  • Release event: 4
  • Watch event: 22
  • Delete event: 12
  • Member event: 1
  • Issue comment event: 26
  • Push event: 124
  • Pull request review comment event: 14
  • Pull request review event: 39
  • Pull request event: 36
Last Year
  • Fork event: 8
  • Create event: 23
  • Commit comment event: 2
  • Issues event: 8
  • Release event: 4
  • Watch event: 22
  • Delete event: 12
  • Member event: 1
  • Issue comment event: 26
  • Push event: 124
  • Pull request review comment event: 14
  • Pull request review event: 39
  • Pull request event: 36

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 376
  • Total Committers: 11
  • Avg Commits per committer: 34.182
  • Development Distribution Score (DDS): 0.386
Past Year
  • Commits: 64
  • Committers: 8
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.688
Top Committers
Name Email Commits
Adam Li a****2@g****m 231
Jong Shin j****m@g****m 45
dependabot[bot] 4****] 30
Haoyin Xu h****u@g****m 28
pre-commit-ci[bot] 6****] 12
YuxinB y****u@g****m 11
Sambit Panda 3****1 7
SUKI I****2@g****m 7
Ryan Hausen r****n@g****m 3
Stefan van der Walt s****t@g****m 1
Jarrod Millman j****n@g****m 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 29
  • Total pull requests: 118
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 13
  • Total pull request authors: 17
  • Average comments per issue: 1.59
  • Average comments per pull request: 1.31
  • Merged pull requests: 58
  • Bot issues: 0
  • Bot pull requests: 21
Past Year
  • Issues: 11
  • Pull requests: 47
  • Average time to close issues: 4 days
  • Average time to close pull requests: 9 days
  • Issue authors: 7
  • Pull request authors: 12
  • Average comments per issue: 0.55
  • Average comments per pull request: 0.68
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 8
Top Authors
Issue Authors
  • adam2392 (7)
  • PSSF23 (4)
  • ogencoglu (2)
  • ryanhausen (2)
  • jovo (2)
  • bdpedigo (1)
  • alexarXu (1)
  • ritviksahajpal (1)
  • awyuan (1)
  • pre-commit-ci[bot] (1)
Pull Request Authors
  • adam2392 (43)
  • PSSF23 (23)
  • dependabot[bot] (22)
  • pre-commit-ci[bot] (20)
  • YuxinB (13)
  • ryanhausen (8)
  • SUKI-O (8)
  • ClarkXu0625 (4)
  • mamamum (2)
  • goraj (2)
  • CambridgeCat13 (2)
  • SamuelCarliles3 (2)
  • alexarXu (2)
  • j1c (1)
  • weioren (1)
Top Labels
Issue Labels
bug (5) Cython (1) research (1) enhancement (1)
Pull Request Labels
dependencies (22) No Changelog Needed (8)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 367 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 3
pypi.org: treeple

Modern decision trees in Python

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 367 Last month
Rankings
Dependent packages count: 10.6%
Stargazers count: 10.9%
Forks count: 11.6%
Average: 23.3%
Dependent repos count: 59.9%
Maintainers (3)
Last synced: 6 months ago

Dependencies

.github/workflows/circle_artifacts.yml actions
  • larsoner/circleci-artifacts-redirector-action master composite
.github/workflows/main.yml actions
  • abatilo/actions-poetry v2.2.0 composite
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • codecov/codecov-action v3 composite
  • softprops/action-gh-release v1 composite
.github/workflows/pr_checks.yml actions
  • actions/checkout v3 composite
poetry.lock pypi
  • 143 dependencies
.github/workflows/build_wheels.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4.5.0 composite
  • actions/upload-artifact v3 composite
  • pypa/cibuildwheel v2.12.0 composite
.github/workflows/pull_request_labeler.yml actions
  • thomasjpfan/labeler v2.5.0 composite
.github/workflows/style.yml actions
  • abatilo/actions-poetry v2.2.0 composite
  • actions/checkout v3 composite
  • actions/setup-python v4.5.0 composite
build_requirements.txt pypi
  • click *
  • cython *
  • doit *
  • meson *
  • ninja *
  • numpy *
  • pydevtool *
  • rich-click *
doc_requirements.txt pypi
  • ipython *
  • matplotlib *
  • memory_profiler *
  • nbsphinx *
  • numpydoc *
  • pandas *
  • portray *
  • pydata-sphinx-theme *
  • seaborn *
  • sphinx <6
  • sphinx-copybutton *
  • sphinx-gallery *
  • sphinx-issues *
  • sphinxcontrib-bibtex *
requirements.txt pypi
  • numpy *
  • scipy *
test_requirements.txt pypi
  • joblib * test
  • memory_profiler * test
  • pytest * test
  • pytest-cov * test
  • tqdm * test
.github/workflows/release.yml actions
  • actions/checkout v4 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4.6.1 composite
  • softprops/action-gh-release v1 composite
doc/_static/versions.json meteor
.github/workflows/send-slack.yml actions
pyproject.toml pypi
  • importlib-resources *
  • numpy ^1.23.0
  • numpy *
  • python >=3.9,<3.12
  • scikit-learn ^1.2.2
  • scikit-learn >=1.3
  • scipy ^1.9.0
  • scipy >=1.5.0