hgboost

hgboost is a python package for hyper-parameter optimization for xgboost, catboost or lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.

https://github.com/erdogant/hgboost

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.4%) to scientific vocabulary

Keywords

catboost crossvalidation gridsearch hyperoptimization lightboost machine-learning python xgboost
Last synced: 6 months ago · JSON representation ·

Repository

hgboost is a python package for hyper-parameter optimization for xgboost, catboost or lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.

Basic Info
Statistics
  • Stars: 64
  • Watchers: 3
  • Forks: 18
  • Open Issues: 6
  • Releases: 19
Topics
catboost crossvalidation gridsearch hyperoptimization lightboost machine-learning python xgboost
Created almost 6 years ago · Last pushed 12 months ago
Metadata Files
Readme Funding License Citation

README.md

hgboost - Hyperoptimized Gradient Boosting

Python PyPI Version License Github Forks GitHub Open Issues Project Status Downloads Downloads DOI Sphinx Open In Colab Medium <!---BuyMeCoffee--> <!---Coffee-->


hgboost is short for Hyperoptimized Gradient Boosting and is a python package for hyperparameter optimization for xgboost, catboost and lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.

hgboost is fun because:

* 1. Hyperoptimization of the Parameter-space using bayesian approach.
* 2. Determines the best scoring model(s) using k-fold cross validation.
* 3. Evaluates best model on independent evaluation set.
* 4. Fit model on entire input-data using the best model.
* 5. Works for classification and regression
* 6. Creating a super-hyperoptimized model by an ensemble of all individual optimized models.
* 7. Return model, space and test/evaluation results.
* 8. Makes insightful plots.

⭐️ Star this repo if you like it ⭐️


Blogs

Medium Blog 1: The Best Boosting Model using Bayesian Hyperparameter Tuning but without Overfitting.

Medium Blog 2: Create Explainable Gradient Boosting Classification models using Bayesian Hyperparameter Optimization.


Documentation pages

On the documentation pages you can find detailed information about the working of the hgboost with many examples.


Colab Notebooks

  • Open regression example In Colab Regression example

  • Open classification example In Colab Classification example


Schematic overview of hgboost

Installation Environment

python conda create -n env_hgboost python=3.8 conda activate env_hgboost

Install from pypi

```bash pip install hgboost pip install -U hgboost # Force update

```

Import hgboost package

python import hgboost as hgboost

Examples

Classification example for xgboost, catboost and lightboost:

```python

Load library

from hgboost import hgboost

Initialization

hgb = hgboost(maxeval=10, threshold=0.5, cv=5, testsize=0.2, valsize=0.2, topcvevals=10, randomstate=42)

Fit xgboost by hyperoptimization and cross-validation

results = hgb.xgboost(X, y, pos_label='survived')

[hgboost] >Start hgboost classification..

[hgboost] >Collecting xgb_clf parameters.

[hgboost] >Number of variables in search space is [11], loss function: [auc].

[hgboost] >method: xgb_clf

[hgboost] >eval_metric: auc

[hgboost] >greaterisbetter: True

[hgboost] >pos_label: True

[hgboost] >Total dataset: (891, 204)

[hgboost] >Hyperparameter optimization..

100% |----| 500/500 [04:39<05:21, 1.33s/trial, best loss: -0.8800619834710744]

[hgboost] >Best performing [xgb_clf] model: auc=0.881198

[hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50

100%|██████████| 10/10 [00:42<00:00, 4.27s/it]

[hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).

[hgboost] >[auc] on independent validation dataset: -0.832

[hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.

```

```python

Plot the ensemble classification validation results

hgb.plot_validation()

```


References

* http://hyperopt.github.io/hyperopt/
* https://github.com/dmlc/xgboost
* https://github.com/microsoft/LightGBM
* https://github.com/catboost/catboost

Maintainers * Erdogan Taskesen, github: erdogant

Contribute * Contributions are welcome.

Licence See LICENSE for details.

Coffee * If you wish to buy me a Coffee for this work, it is very appreciated :)

Owner

  • Name: Erdogan
  • Login: erdogant
  • Kind: user
  • Location: Den Haag

Machine Learning | Statistics | Bayesian | D3js | Visualizations

Citation (CITATION.cff)

# YAML 1.2
---
authors: 
  -
    family-names: Taskesen
    given-names: Erdogan
    orcid: "https://orcid.org/0000-0002-3430-9618"
cff-version: "1.1.0"
date-released: 2020-10-07
keywords: 
  - "python"
  - "xgboost"
  - "catboost"
  - "lightboost"
  - "gridsearch"
  - "crossvalidation"
  - "hyperoptimization"
  - "two-class-classification"
  - "multi-class-classification"
  - "regression"
license: "MIT"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://erdogant.github.io/hgboost"
title: "hgboost is a python package for hyperparameter optimization for xgboost, catboost and lightboost for both classification and regression tasks."
version: "1.0.0"
...

GitHub Events

Total
  • Issues event: 1
  • Watch event: 9
  • Push event: 2
  • Fork event: 2
Last Year
  • Issues event: 1
  • Watch event: 9
  • Push event: 2
  • Fork event: 2

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 271
  • Total Committers: 2
  • Avg Commits per committer: 135.5
  • Development Distribution Score (DDS): 0.004
Past Year
  • Commits: 5
  • Committers: 1
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Erdogan Taskesen e****t@g****m 270
A. Bram Neijt b****m@n****l 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 21
  • Total pull requests: 1
  • Average time to close issues: 20 days
  • Average time to close pull requests: 24 minutes
  • Total issue authors: 15
  • Total pull request authors: 1
  • Average comments per issue: 1.43
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: 3 days
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • recherHE (3)
  • SSLPP (2)
  • chandrabhuma (2)
  • ninjit (2)
  • quancore (2)
  • ChandanVerma (1)
  • chennavc (1)
  • juanramonua (1)
  • twcult (1)
  • Mikki99 (1)
  • LAH19999 (1)
  • ChalktyGeo (1)
  • nicolasaldecoa (1)
  • 54dyourc (1)
  • CarterwoodAnalytics (1)
Pull Request Authors
  • bneijt (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 1,290 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 18
  • Total maintainers: 1
pypi.org: hgboost

hgboost is a python package for hyperparameter optimization for xgboost, catboost and lightboost for both classification and regression tasks.

  • Versions: 18
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 1,290 Last month
Rankings
Downloads: 6.7%
Forks count: 9.3%
Stargazers count: 9.4%
Dependent packages count: 9.8%
Average: 11.4%
Dependent repos count: 21.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/source/requirements.txt pypi
  • pipinstallsphinx_rtd_theme *
requirements-dev.txt pypi
  • irelease * development
  • nbconvert * development
  • numpy * development
  • pytest * development
  • rst2pdf * development
  • sphinx * development
  • sphinx_rtd_theme * development
  • sphinxcontrib-fulltoc * development
  • spyder-kernels ==2.3. development
requirements.txt pypi
  • catboost *
  • classeval *
  • colourmap *
  • df2onehot *
  • hyperopt *
  • lightgbm *
  • matplotlib *
  • numpy *
  • pandas *
  • pypickle *
  • seaborn *
  • tqdm *
  • treeplot *
  • wget *
  • xgboost *
.github/workflows/codeql-analysis.yml actions
  • actions/checkout v2 composite
  • github/codeql-action/analyze v1 composite
  • github/codeql-action/autobuild v1 composite
  • github/codeql-action/init v1 composite
pyproject.toml pypi
  • classeval *
  • colourmap *
  • datazets *
  • df2onehot *
  • hyperopt *
  • lightgbm >=4.1.0
  • matplotlib *
  • numpy *
  • pandas *
  • pypickle *
  • seaborn *
  • tqdm *
  • treeplot *
  • xgboost *