hgboost
hgboost is a python package for hyper-parameter optimization for xgboost, catboost or lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary
Keywords
Repository
hgboost is a python package for hyper-parameter optimization for xgboost, catboost or lightboost using cross-validation, and evaluating the results on an independent validation set. hgboost can be applied for classification and regression tasks.
Basic Info
- Host: GitHub
- Owner: erdogant
- License: other
- Language: Python
- Default Branch: master
- Homepage: http://erdogant.github.io/hgboost
- Size: 24.4 MB
Statistics
- Stars: 64
- Watchers: 3
- Forks: 18
- Open Issues: 6
- Releases: 19
Topics
Metadata Files
README.md
hgboost - Hyperoptimized Gradient Boosting
hgboost is short for Hyperoptimized Gradient Boosting and is a python package for hyperparameter optimization for xgboost, catboost and lightboost using cross-validation, and evaluating the results on an independent validation set.
hgboost can be applied for classification and regression tasks.
hgboost is fun because:
* 1. Hyperoptimization of the Parameter-space using bayesian approach.
* 2. Determines the best scoring model(s) using k-fold cross validation.
* 3. Evaluates best model on independent evaluation set.
* 4. Fit model on entire input-data using the best model.
* 5. Works for classification and regression
* 6. Creating a super-hyperoptimized model by an ensemble of all individual optimized models.
* 7. Return model, space and test/evaluation results.
* 8. Makes insightful plots.
⭐️ Star this repo if you like it ⭐️
Blogs
Medium Blog 1: The Best Boosting Model using Bayesian Hyperparameter Tuning but without Overfitting.
Medium Blog 2: Create Explainable Gradient Boosting Classification models using Bayesian Hyperparameter Optimization.
Documentation pages
On the documentation pages you can find detailed information about the working of the hgboost with many examples.
Colab Notebooks
Schematic overview of hgboost
Installation Environment
python
conda create -n env_hgboost python=3.8
conda activate env_hgboost
Install from pypi
```bash pip install hgboost pip install -U hgboost # Force update
```
Import hgboost package
python
import hgboost as hgboost
Examples
Classification example for xgboost, catboost and lightboost:
```python
Load library
from hgboost import hgboost
Initialization
hgb = hgboost(maxeval=10, threshold=0.5, cv=5, testsize=0.2, valsize=0.2, topcvevals=10, randomstate=42)
Fit xgboost by hyperoptimization and cross-validation
results = hgb.xgboost(X, y, pos_label='survived')
[hgboost] >Start hgboost classification..
[hgboost] >Collecting xgb_clf parameters.
[hgboost] >Number of variables in search space is [11], loss function: [auc].
[hgboost] >method: xgb_clf
[hgboost] >eval_metric: auc
[hgboost] >greaterisbetter: True
[hgboost] >pos_label: True
[hgboost] >Total dataset: (891, 204)
[hgboost] >Hyperparameter optimization..
100% |----| 500/500 [04:39<05:21, 1.33s/trial, best loss: -0.8800619834710744]
[hgboost] >Best performing [xgb_clf] model: auc=0.881198
[hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50
100%|██████████| 10/10 [00:42<00:00, 4.27s/it]
[hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).
[hgboost] >[auc] on independent validation dataset: -0.832
[hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.
```
```python
Plot the ensemble classification validation results
hgb.plot_validation()
```
References
* http://hyperopt.github.io/hyperopt/
* https://github.com/dmlc/xgboost
* https://github.com/microsoft/LightGBM
* https://github.com/catboost/catboost
Maintainers * Erdogan Taskesen, github: erdogant
Contribute * Contributions are welcome.
Licence See LICENSE for details.
Coffee * If you wish to buy me a Coffee for this work, it is very appreciated :)
Owner
- Name: Erdogan
- Login: erdogant
- Kind: user
- Location: Den Haag
- Website: https://erdogant.github.io/
- Repositories: 51
- Profile: https://github.com/erdogant
Machine Learning | Statistics | Bayesian | D3js | Visualizations
Citation (CITATION.cff)
# YAML 1.2
---
authors:
-
family-names: Taskesen
given-names: Erdogan
orcid: "https://orcid.org/0000-0002-3430-9618"
cff-version: "1.1.0"
date-released: 2020-10-07
keywords:
- "python"
- "xgboost"
- "catboost"
- "lightboost"
- "gridsearch"
- "crossvalidation"
- "hyperoptimization"
- "two-class-classification"
- "multi-class-classification"
- "regression"
license: "MIT"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://erdogant.github.io/hgboost"
title: "hgboost is a python package for hyperparameter optimization for xgboost, catboost and lightboost for both classification and regression tasks."
version: "1.0.0"
...
GitHub Events
Total
- Issues event: 1
- Watch event: 9
- Push event: 2
- Fork event: 2
Last Year
- Issues event: 1
- Watch event: 9
- Push event: 2
- Fork event: 2
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Erdogan Taskesen | e****t@g****m | 270 |
| A. Bram Neijt | b****m@n****l | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 21
- Total pull requests: 1
- Average time to close issues: 20 days
- Average time to close pull requests: 24 minutes
- Total issue authors: 15
- Total pull request authors: 1
- Average comments per issue: 1.43
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 0
- Average time to close issues: 3 days
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- recherHE (3)
- SSLPP (2)
- chandrabhuma (2)
- ninjit (2)
- quancore (2)
- ChandanVerma (1)
- chennavc (1)
- juanramonua (1)
- twcult (1)
- Mikki99 (1)
- LAH19999 (1)
- ChalktyGeo (1)
- nicolasaldecoa (1)
- 54dyourc (1)
- CarterwoodAnalytics (1)
Pull Request Authors
- bneijt (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 1,290 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 18
- Total maintainers: 1
pypi.org: hgboost
hgboost is a python package for hyperparameter optimization for xgboost, catboost and lightboost for both classification and regression tasks.
- Homepage: https://erdogant.github.io/hgboost
- Documentation: https://hgboost.readthedocs.io/
- License: MIT License
-
Latest release: 1.1.6
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- pipinstallsphinx_rtd_theme *
- irelease * development
- nbconvert * development
- numpy * development
- pytest * development
- rst2pdf * development
- sphinx * development
- sphinx_rtd_theme * development
- sphinxcontrib-fulltoc * development
- spyder-kernels ==2.3. development
- catboost *
- classeval *
- colourmap *
- df2onehot *
- hyperopt *
- lightgbm *
- matplotlib *
- numpy *
- pandas *
- pypickle *
- seaborn *
- tqdm *
- treeplot *
- wget *
- xgboost *
- actions/checkout v2 composite
- github/codeql-action/analyze v1 composite
- github/codeql-action/autobuild v1 composite
- github/codeql-action/init v1 composite
- classeval *
- colourmap *
- datazets *
- df2onehot *
- hyperopt *
- lightgbm >=4.1.0
- matplotlib *
- numpy *
- pandas *
- pypickle *
- seaborn *
- tqdm *
- treeplot *
- xgboost *