Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: rohankumardubey
  • License: other
  • Language: Python
  • Default Branch: master
  • Size: 11.2 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed almost 4 years ago
Metadata Files
Readme Funding License Citation

README.md

hgboost - Hyperoptimized Gradient Boosting

Python PyPI Version License Github Forks GitHub Open Issues Project Status Downloads Downloads DOI Sphinx Open In Colab <!---BuyMeCoffee--> <!---Coffee-->

hgboost is short for Hyperoptimized Gradient Boosting and is a python package for hyperparameter optimization for xgboost, catboost and lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.

hgboost is fun because:

* 1. Hyperoptimization of the Parameter-space using bayesian approach.
* 2. Determines the best scoring model(s) using k-fold cross validation.
* 3. Evaluates best model on independent evaluation set.
* 4. Fit model on entire input-data using the best model.
* 5. Works for classification and regression
* 6. Creating a super-hyperoptimized model by an ensemble of all individual optimized models.
* 7. Return model, space and test/evaluation results.
* 8. Makes insightful plots.

** Star this repo if you like it **

Documentation pages

On the documentation pages you can find detailed information about the working of the hgboost with many examples.

Colab Notebooks

  • Open regression example In Colab Regression example

  • Open classification example In Colab Classification example

Schematic overview of hgboost

Installation Environment

python conda create -n env_hgboost python=3.8 conda activate env_hgboost

Install from pypi

```bash pip install hgboost pip install -U hgboost # Force update

```

Import hgboost package

python import hgboost as hgboost

Examples

Classification example for xgboost, catboost and lightboost:

```python

Load library

from hgboost import hgboost

Initialization

hgb = hgboost(maxeval=10, threshold=0.5, cv=5, testsize=0.2, valsize=0.2, topcvevals=10, randomstate=42)

Fit xgboost by hyperoptimization and cross-validation

results = hgb.xgboost(X, y, pos_label='survived')

[hgboost] >Start hgboost classification..

[hgboost] >Collecting xgb_clf parameters.

[hgboost] >Number of variables in search space is [11], loss function: [auc].

[hgboost] >method: xgb_clf

[hgboost] >eval_metric: auc

[hgboost] >greaterisbetter: True

[hgboost] >pos_label: True

[hgboost] >Total dataset: (891, 204)

[hgboost] >Hyperparameter optimization..

100% |----| 500/500 [04:39<05:21, 1.33s/trial, best loss: -0.8800619834710744]

[hgboost] >Best performing [xgb_clf] model: auc=0.881198

[hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50

100%|| 10/10 [00:42<00:00, 4.27s/it]

[hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).

[hgboost] >[auc] on independent validation dataset: -0.832

[hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.

```

```python

Plot the ensemble classification validation results

hgb.plot_validation()

```


References

* http://hyperopt.github.io/hyperopt/
* https://github.com/dmlc/xgboost
* https://github.com/microsoft/LightGBM
* https://github.com/catboost/catboost

Maintainers * Erdogan Taskesen, github: erdogant

Contribute * Contributions are welcome.

Licence See LICENSE for details.

Coffee * If you wish to buy me a Coffee for this work, it is very appreciated :)

Owner

  • Name: Rohan Dubey
  • Login: rohankumardubey
  • Kind: user
  • Location: India
  • Company: Pokerstars

if (brain != empty) { keepCoding(); } else { orderCoffee(); }

Citation (CITATION.cff)

# YAML 1.2
---
authors: 
  -
    family-names: Taskesen
    given-names: Erdogan
    orcid: "https://orcid.org/0000-0002-3430-9618"
cff-version: "1.1.0"
date-released: 2020-10-07
keywords: 
  - "python"
  - "xgboost"
  - "catboost"
  - "lightboost"
  - "gridsearch"
  - "crossvalidation"
  - "hyperoptimization"
  - "two-class-classification"
  - "multi-class-classification"
  - "regression"
license: "MIT"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://erdogant.github.io/hgboost"
title: "hgboost is a python package for hyperparameter optimization for xgboost, catboost and lightboost for both classification and regression tasks."
version: "1.0.0"
...

GitHub Events

Total
Last Year