colboost

colboost: boosting ensembles based on column generation

https://github.com/frakkerman/colboost

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 9 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

colboost: boosting ensembles based on column generation

Basic Info

Host: GitHub
Owner: frakkerman
License: mit
Language: Python
Default Branch: master
Homepage: https://arxiv.org/abs/2507.18242
Size: 125 KB

Statistics

Stars: 7
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 4

Created 12 months ago · Last pushed 11 months ago

Metadata Files

Readme License Citation Codeowners

colboost: Ensemble Boosting with Column Generation

colboost is a Python library for training ensemble classifiers using mathematical programming based boosting methods such as LPBoost. Each iteration fits a weak learner and solves a mathematical program to determine optimal ensemble weights. The implementation is compatible with scikit-learn and supports any scikit-learn-compatible base learner. Currently, the library only supports binary classification.

Installation

The easiest way to install colboost is using pip:

bash pip install colboost

This project requires the Gurobi solver. Free academic licenses are available:

https://www.gurobi.com/academia/academic-program-and-licenses/

Available Parameters

| Parameter | Default | Description | |---|---|----------------------------------------------------------------------------------------------------------------------------| | solver | "nm_boost" | Which formulation to use. Options: "nm_boost", "cg_boost", "erlp_boost", "lp_boost", "md_boost", "qrlp_boost". | | base_estimator | None | Optional base estimator (defaults to CART decision tree if not provided). | | max_depth | 1 | Maximum depth of individual trees (only relevant when using default, base_estimator=None). | | max_iter | 100 | Maximum number of boosting iterations. | | use_crb | False | Whether to use confidence-rated boosting (soft-voting, only applicable when using tree-based base_estimator). | | check_dual_const | True | Whether to check dual feasibility in each iteration. | | early_stopping | True | Stop boosting early if no improvement is observed. | | acc_eps | 1e-4 | Tolerance for accuracy-based stopping criteria. | | acc_check_interval | 5 | How often (in iterations) to check accuracy for early stopping. | | gurobi_time_limit | 60 | Time limit (in seconds) for each Gurobi solve. | | gurobi_num_threads | 1 | Number of threads Gurobi uses. | | tradeoff_hyperparam | 1e-2 | Trade-off parameter for regularization. | | seed | 1 | Random seed for reproducibility. |

Example 1: fitting an ensemble

```python from sklearn.datasets import make_classification from colboost.ensemble import EnsembleClassifier

Create a synthetic binary classification problem

X, y = makeclassification(nsamples=200, nfeatures=20, randomstate=0) y = 2 * y - 1 # Convert labels from {0, 1} to {-1, +1}

Train an NMBoost-based ensemble

model = EnsembleClassifier(solver="nmboost", maxiter=50) model.fit(X, y) print("Training accuracy:", model.score(X, y))

Obtain margin values y * f(x)

margins = model.compute_margins(X, y) print("First 5 margins:", margins[:5])

```

Example 2: Reweighting an existing ensemble

```python from sklearn.ensemble import AdaBoostClassifier from sklearn.datasets import make_classification from colboost.ensemble import EnsembleClassifier import numpy as np

Generate data

X, y = makeclassification(nsamples=200, nfeatures=20, randomstate=42) y = 2 * y - 1 # Convert labels to {-1, +1}

Train AdaBoost with sklearn

ada = AdaBoostClassifier(nestimators=100, randomstate=0) ada.fit(X, y)

Reweight AdaBoost base estimators using NMBoost

model = EnsembleClassifier(solver="nmboost") model.reweightensemble(X, y, learners=ada.estimators_)

print("Training accuracy after reweighting:", model.score(X, y)) print("Number of non-zero weights after reweighting:", np.count_nonzero(model.weights)) ```

Inspecting model attributes after training

```python

assuming 'model' is the fitted colboost model

print("Learners:", model.learners) print("Weights:", model.weights) print("Objective values:", model.objectivevalues) print("Solve times:", model.solvetimes)
print("Training accuracy per iter:", model.trainaccuracies) print("Number of iterations:", model.niter) print("Solver used:", model.modelname)

compute margin distribution

margins = model.compute_margins(X, y) print("First 5 margins (y * f(x)):", margins[:5]) ```

Implemented Formulations

NMBoost
Negative Margin Boosting, emphasizing both accuracy and penalization of negative margins.
Introduced in our paper (2025)
QRLPBoost
Quadratically Regularized LPBoost with second-order KL-divergence approximation.
Introduced in our paper (2025)
LPBoost
Linear Programming Boosting with slack variables (soft-margin).
Demiriz, Bennett, Shawe-Taylor (2002)
MDBoost
Margin Distribution Boosting, optimizing both margin mean and variance.
Shen & Li (2009)
CGBoost
Column Generation Boosting with L2-regularized margin formulation.
Bi, Zhang, Bennett (2004)
ERLPBoost
Entropy-Regularized LPBoost using KL-divergence between successive distributions.
Warmuth, Glocer, Vishwanathan (2008)

Installation (developers)

To install in development mode, clone this repo and:

bash python3 -m venv env source env/bin/activate pip install -e .

To verify the installation, in the root execute:

bash pytest

Note: the install requires recent versions of pip and of the setuptools library. If needed, update both using:

bash pip install --upgrade pip setuptools

Contributing

If you have proposed extensions to this codebase, feel free to do a pull request! If you experience issues, please open an issue in GitHub and provide a clear explanation.

Citation

When using the code or data in this repo, please cite the following work:

@misc{akkerman2025_lpboosting, title={Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods}, author={Fabian Akkerman and Julien Ferry and Christian Artigues and Emmanuel Hébrard and Thibaut Vidal}, year={2025}, archivePrefix={arXiv}, primaryClass={cs.LG} }

Note: This library is a clean reimplementation of the original code from the paper. While we have carefully validated the implementation, there may be minor discrepancies in results compared to those reported in the paper. For full reproducibility, we provide a separate repository containing the exact codebase used for the paper, along with all result files, including tested hyperparameter configurations and results not shown in the paper. url will be added after acceptance

MIT license
Copyright 2025 © Fabian Akkerman, Julien Ferry, Christian Artigues, Emmanuel Hébrard, Thibaut Vidal

Owner

Name: Fabian Akkerman
Login: frakkerman
Kind: user
Location: Enschede, the Netherlands
Company: University of Twente

Website: https://people.utwente.nl/f.r.akkerman
Repositories: 1
Profile: https://github.com/frakkerman

PhD candidate

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the following work:"
title: "Boosting Revisited: Benchmarking and Advancing LP-Based Ensemble Methods"
authors:
  - family-names: Akkerman
    given-names: Fabian
  - family-names: Ferry
    given-names: Julien
  - family-names: Artigues
    given-names: Christian
  - family-names: Hébrard
    given-names: Emmanuel
  - family-names: Vidal
    given-names: Thibaut
version: "0.1.4"
date-released: 2025-07-24
repository-code: https://github.com/frakkerman/colboost

GitHub Events

Total

Watch event: 5
Public event: 1
Push event: 4

Last Year

Watch event: 5
Public event: 1
Push event: 4

Packages

Total packages: 1
Total downloads:
- pypi 29 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

pypi.org: colboost

LP-based ensemble learning with column generation

Documentation: https://colboost.readthedocs.io/
License: MIT License
Latest release: 0.1.4
published 11 months ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 29 Last month

Rankings

Dependent packages count: 8.8%

Average: 29.1%

Dependent repos count: 49.5%

Maintainers (1)

frakkerman

Last synced: 10 months ago

Dependencies

.github/workflows/publish.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite
pypa/gh-action-pypi-publish v1.5.0 composite

.github/workflows/test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

pyproject.toml pypi

gurobipy *
numpy *
pandas *
scikit-learn *
tqdm *