yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.

https://github.com/google/yggdrasil-decision-forests

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 35 committers (2.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary

Keywords

cart cli cpp decision-forest decision-trees distributed-computing go gradient-boosting interpretability javascript machine-learning ml pypi python random-forest tensorflow

Keywords from Contributors

distribution deep-neural-networks jax transformers interactive gpt-3 interface prompt-engineering mot multi-agents
Last synced: 6 months ago · JSON representation

Repository

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.

Basic Info
  • Host: GitHub
  • Owner: google
  • License: apache-2.0
  • Language: C++
  • Default Branch: main
  • Homepage: https://ydf.readthedocs.io/
  • Size: 45.2 MB
Statistics
  • Stars: 603
  • Watchers: 16
  • Forks: 65
  • Open Issues: 43
  • Releases: 37
Topics
cart cli cpp decision-forest decision-trees distributed-computing go gradient-boosting interpretability javascript machine-learning ml pypi python random-forest tensorflow
Created almost 5 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

PyPI License Static Static PyPI Downloads

YDF (Yggdrasil Decision Forests) is a library to train, evaluate, interpret, and serve Random Forest, Gradient Boosted Decision Trees, CART and Isolation forest models.

See the documentation for more information on YDF.

Installation

To install YDF from PyPI, run:

shell pip install ydf -U

Usage example

Open in Colab

```python import ydf import pandas as pd

Load dataset with Pandas

dspath = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasildecisionforests/testdata/dataset/" trainds = pd.readcsv(dspath + "adulttrain.csv") testds = pd.readcsv(dspath + "adulttest.csv")

Train a Gradient Boosted Trees model

model = ydf.GradientBoostedTreesLearner(label="income").train(train_ds)

Look at a model (input features, training logs, structure, etc.)

model.describe()

Evaluate a model (e.g. roc, accuracy, confusion matrix, confidence intervals)

model.evaluate(test_ds)

Generate predictions

model.predict(test_ds)

Analyse a model (e.g. partial dependence plot, variable importance)

model.analyze(test_ds)

Benchmark the inference speed of a model

model.benchmark(test_ds)

Save the model

model.save("/tmp/my_model") ```

Example with the C++ API.

```c++ auto dataset_path = "csv:train.csv";

// List columns in training dataset DataSpecification spec; CreateDataSpec(dataset_path, false, {}, &spec);

// Create a training configuration TrainingConfig trainconfig; trainconfig.setlearner("RANDOMFOREST"); trainconfig.settask(Task::CLASSIFICATION); trainconfig.setlabel("my_label");

// Train model std::uniqueptr learner; GetLearner(trainconfig, &learner); auto model = learner->Train(dataset_path, spec);

// Export model SaveModel("my_model", model.get()); ```

(based on examples/beginner.cc)

Next steps

Check the Getting Started tutorial 🧭.

Citation

If you us Yggdrasil Decision Forests in a scientific publication, please cite the following paper: Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library.

Bibtex

@inproceedings{GBBSP23, author = {Mathieu Guillame{-}Bert and Sebastian Bruch and Richard Stotz and Jan Pfeifer}, title = {Yggdrasil Decision Forests: {A} Fast and Extensible Decision Forests Library}, booktitle = {Proceedings of the 29th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining, {KDD} 2023, Long Beach, CA, USA, August 6-10, 2023}, pages = {4068--4077}, year = {2023}, url = {https://doi.org/10.1145/3580305.3599933}, doi = {10.1145/3580305.3599933}, }

Raw

Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library, Guillame-Bert et al., KDD 2023: 4068-4077. doi:10.1145/3580305.3599933

Contact

You can contact the core development team at decision-forests-contact@google.com.

Credits

Yggdrasil Decision Forests and TensorFlow Decision Forests are developed by:

  • Mathieu Guillame-Bert (gbm AT google DOT com)
  • Richard Stotz (richardstotz AT google DOT com)
  • Jan Pfeifer (janpf AT google DOT com)
  • Sebastian Bruch (sebastian AT bruch DOT io)
  • Arvind Srinivasan (arvnd AT google DOT com)

Contributing

Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are welcome. If you want to contribute, check the contribution guidelines.

License

Apache License 2.0

Owner

  • Name: Google
  • Login: google
  • Kind: organization
  • Email: opensource@google.com
  • Location: United States of America

Google ❤️ Open Source

GitHub Events

Total
  • Create event: 9
  • Release event: 5
  • Issues event: 63
  • Watch event: 118
  • Delete event: 2
  • Issue comment event: 105
  • Push event: 230
  • Pull request review event: 11
  • Pull request review comment event: 13
  • Pull request event: 31
  • Fork event: 18
Last Year
  • Create event: 9
  • Release event: 5
  • Issues event: 63
  • Watch event: 118
  • Delete event: 2
  • Issue comment event: 105
  • Push event: 230
  • Pull request review event: 11
  • Pull request review comment event: 13
  • Pull request event: 31
  • Fork event: 18

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 1,259
  • Total Committers: 35
  • Avg Commits per committer: 35.971
  • Development Distribution Score (DDS): 0.46
Past Year
  • Commits: 374
  • Committers: 20
  • Avg Commits per committer: 18.7
  • Development Distribution Score (DDS): 0.481
Top Committers
Name Email Commits
Mathieu Guillame-Bert g****m@g****m 680
Richard Stotz r****z@g****m 460
TensorFlow Decision Forests Team n****y@g****m 58
Damiano Amatruda d****a@g****m 9
Jan Pfeifer j****f@g****m 8
Arvind Srinivasan a****d@g****m 3
Ariel Lubonja a****l@c****u 3
Dmitry Tsarkov t****r@g****m 3
Yggdrasil Decision Forests Team d****t@g****m 2
Jake VanderPlas v****s@g****m 2
Alejandro Barrachina Argudo 4****2 2
Emmanuel Ferdman e****n@g****m 2
Howard Chiam h****m 2
Ivo Ristovski List i****t@g****m 2
Jean-Baptiste Lespiau j****u@g****m 2
Peter Hawkins p****s@g****m 2
Bogdan Graur b****r@g****m 1
Alejandro Cruzado-Ruiz l****m@g****m 1
Alex b****z 1
Arno Eigenwillig a****w@g****m 1
Chris Kennelly c****y@g****m 1
David Dunleavy d****y@g****m 1
Florian Mayer f****r@g****m 1
Hana Joo h****o@g****m 1
John Cater j****r@g****m 1
John QiangZhang j****g@g****m 1
Laramie Leavitt l****r@g****m 1
Matthew Soulanille m****w@s****t 1
Mehdi Amini a****m@g****m 1
Michelangelo Conserva m****a@g****m 1
and 5 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 133
  • Total pull requests: 55
  • Average time to close issues: 2 months
  • Average time to close pull requests: 10 days
  • Total issue authors: 76
  • Total pull request authors: 20
  • Average comments per issue: 3.08
  • Average comments per pull request: 0.93
  • Merged pull requests: 35
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 54
  • Pull requests: 30
  • Average time to close issues: 16 days
  • Average time to close pull requests: 5 days
  • Issue authors: 42
  • Pull request authors: 10
  • Average comments per issue: 1.41
  • Average comments per pull request: 1.13
  • Merged pull requests: 16
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • JoseAF (9)
  • Arnold1 (6)
  • CodingDoug (5)
  • lusis-ai (5)
  • andrea-cassioli-maersk (5)
  • TonyCongqianWang (4)
  • jimidle (4)
  • rlcauvin (4)
  • alpetukhov (3)
  • marquisthunder (3)
  • stephen-up (3)
  • salamanders (3)
  • omit-ai (3)
  • PSSF23 (3)
  • patrickjedlicka (3)
Pull Request Authors
  • achoum (18)
  • rstz (13)
  • dependabot[bot] (4)
  • YueWan1 (3)
  • ariellubonja (3)
  • hchiam (3)
  • emmanuel-ferdman (3)
  • ALK222 (2)
  • fmayer (2)
  • copybara-service[bot] (1)
  • Neutrovertido (1)
  • janpfeifer (1)
  • stephen-up (1)
  • Willian-Zhang (1)
  • LarytheLord (1)
Top Labels
Issue Labels
enhancement (5) bug (3) solved-in-next-release (3) help wanted (2) documentation (1) good first issue (1) question (1)
Pull Request Labels
dependencies (4) javascript (2) cla: yes (1)

Packages

  • Total packages: 5
  • Total downloads:
    • npm 8,171 last-month
    • pypi 91,609 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 84
  • Total maintainers: 4
proxy.golang.org: github.com/google/yggdrasil-decision-forests/yggdrasil_decision_forests/port/go
  • Versions: 54
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 2.2%
Forks count: 3.1%
Average: 5.4%
Dependent packages count: 7.0%
Dependent repos count: 9.3%
Last synced: 6 months ago
pypi.org: ydf

YDF (short for Yggdrasil Decision Forests) is a library for training, serving, evaluating and analyzing decision forest models such as Random Forest and Gradient Boosted Trees.

  • Versions: 25
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 91,609 Last month
Rankings
Stargazers count: 3.3%
Forks count: 6.0%
Downloads: 7.6%
Average: 9.7%
Dependent packages count: 10.1%
Dependent repos count: 21.5%
Maintainers (2)
Last synced: 6 months ago
npmjs.org: ydf-training

Training YDF models in Javascript.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 11 Last month
Rankings
Stargazers count: 2.5%
Forks count: 3.3%
Average: 17.2%
Dependent repos count: 25.7%
Dependent packages count: 37.2%
Maintainers (2)
Last synced: 6 months ago
npmjs.org: yggdrasil-decision-forests

With this package, you can generate predictions of machine learning models trained with YDF in browser and with NodeJS.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 8,113 Last month
Rankings
Stargazers count: 2.8%
Forks count: 3.5%
Average: 19.0%
Dependent repos count: 28.4%
Dependent packages count: 41.3%
Maintainers (1)
Last synced: 6 months ago
npmjs.org: ydf-inference

With this package, you can generate predictions of machine learning models trained with YDF in browser and with NodeJS.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 47 Last month
Rankings
Dependent repos count: 25.7%
Average: 31.5%
Dependent packages count: 37.2%
Maintainers (1)
Last synced: 6 months ago

Dependencies

yggdrasil_decision_forests/port/go/go.mod go
  • github.com/google/go-cmp v0.5.8
  • google.golang.org/protobuf v1.28.1
yggdrasil_decision_forests/port/go/go.sum go
  • github.com/golang/protobuf v1.5.0
  • github.com/google/go-cmp v0.5.5
  • github.com/google/go-cmp v0.5.8
  • golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543
  • google.golang.org/protobuf v1.26.0-rc.1
  • google.golang.org/protobuf v1.28.1
documentation/rtd/requirements.txt pypi
  • myst-parser *
  • readthedocs-sphinx-search ==0.1.1
  • sphinx ==4.2.0
  • sphinx-autoapi *
  • sphinx-autodoc-typehints *
  • sphinx-book-theme >=0.3.3
  • sphinx-copybutton >=0.5.0
  • sphinx-remove-toctrees *
  • sphinx-sitemap *
  • sphinx_design *
  • sphinx_rtd_theme ==1.0.0