mcboost
mcboost: Multi-Calibration Boosting for R - Published in JOSS (2021)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 10 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: arxiv.org, joss.theoj.org -
✓Committers with academic emails
1 of 8 committers (12.5%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
Multi-Calibration & Multi-Accuracy Boosting for R
Basic Info
Statistics
- Stars: 32
- Watchers: 4
- Forks: 4
- Open Issues: 6
- Releases: 6
Topics
Metadata Files
README.md
mcboost
What does it do?
mcboost implements Multi-Calibration Boosting (Hebert-Johnson et al., 2018; Kim et al., 2019) for the multi-calibration of a machine learning model's prediction. Multi-Calibration works best in scenarios where the underlying data & labels are unbiased but a bias is introduced within the algorithm's fitting procedure. This is often the case, e.g. when an algorithm fits a majority population while ignoring or under-fitting minority populations.
For more information and example, see the package's website.
More details with respect to usage and the procedures can be found in the package vignettes.
Installation
The current version can be downloaded from CRAN using:
r
install.packages("mcboost")
You can install the development version of mcboost from Github with:
r
remotes::install_github("mlr-org/mcboost")
Usage
Post-processing with mcboost needs three components. We start with an initial prediction model (1) and an auditing algorithm (2) that may be customized by the user. The auditing algorithm then runs Multi-Calibration-Boosting on a labeled auditing dataset (3). The resulting model can be used for obtaining multi-calibrated predictions.
Example
In this simple example, our goal is to improve calibration
for an initial predictor, e.g. a ML algorithm trained on
an initial task.
Internally, mcboost often makes use of mlr3 and learners that come with mlr3learners.
r
library(mcboost)
library(mlr3)
First we set up an example dataset.
r
# Example Data: Sonar Task
tsk = tsk("sonar")
tid = sample(tsk$row_ids, 100) # 100 rows for training
train_data = tsk$data(cols = tsk$feature_names, rows = tid)
train_labels = tsk$data(cols = tsk$target_names, rows = tid)[[1]]
To provide an example, we assume that we have already a learner l which we train below.
We can now wrap this initial learner's predict function for use with mcboost, since mcboost expects the initial model to be specified as a function with data as input.
```r l = lrn("classif.rpart") l$train(tsk$clone()$filter(tid))
initpredictor = function(data) { # Get response prediction from Learner p = l$predictnewdata(data)$response # One-hot encode and take first column one_hot(p) } ```
We can now run Multi-Calibration Boosting by instantiating the object and calling the multicalibrate method.
Note, that typically, we would use Multi-Calibration on a separate validation set!
We furthermore select the auditor model, a SubpopAuditorFitter,
in our case a Decision Tree:
r
mc = MCBoost$new(
init_predictor = init_predictor,
auditor_fitter = "TreeAuditorFitter")
mc$multicalibrate(train_data, train_labels)
Lastly, we predict on new data.
r
tstid = setdiff(tsk$row_ids, tid) # held-out data
test_data = tsk$data(cols = tsk$feature_names, rows = tstid)
mc$predict_probs(test_data)
Multi-Calibration
While mcboost in its defaults implements Multi-Accuracy (Kim et al., 2019),
it can also multi-calibrate predictors (Hebert-Johnson et al., 2018).
In order to achieve this, we have to set the following hyperparameters:
r
mc = MCBoost$new(
init_predictor = init_predictor,
auditor_fitter = "TreeAuditorFitter",
num_buckets = 10,
multiplicative = FALSE
)
MCBoost as a PipeOp
mcboost can also be used within a mlr3pipeline in order to use at the full end-to-end pipeline (in the form of a GraphLearner).
r
library(mlr3)
library(mlr3pipelines)
gr = ppl_mcboost(lrn("classif.rpart"))
tsk = tsk("sonar")
tid = sample(1:208, 108)
gr$train(tsk$clone()$filter(tid))
gr$predict(tsk$clone()$filter(setdiff(1:208, tid)))
Further Examples
The mcboost vignettes Basics and Extensions and Health Survey Example demonstrate a lot of interesting showcases for applying mcboost.
Contributing
This R package is licensed under the LGPL-3. If you encounter problems using this software (lack of documentation, misleading or wrong documentation, unexpected behaviour, bugs, …) or just want to suggest features, please open an issue in the issue tracker. Pull requests are welcome and will be included at the discretion of the maintainers.
As this project is developed with mlr3's style guide in mind, the following resources can be helpful to individuals wishing to contribute: Please consult the wiki for a style guide, a roxygen guide and a pull request guide.
Code of Conduct
Please note that the mcboost project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Citing mcboost
If you use mcboost, please cite our package as well as the two papers it is based on:
@article{pfisterer2021,
author = {Pfisterer, Florian and Kern, Christoph and Dandl, Susanne and Sun, Matthew and
Kim, Michael P. and Bischl, Bernd},
title = {mcboost: Multi-Calibration Boosting for R},
journal = {Journal of Open Source Software},
doi = {10.21105/joss.03453},
url = {https://doi.org/10.21105/joss.03453},
year = {2021},
publisher = {The Open Journal},
volume = {6},
number = {64},
pages = {3453}
}
# Multi-Calibration
@inproceedings{hebert-johnson2018,
title = {Multicalibration: Calibration for the ({C}omputationally-Identifiable) Masses},
author = {Hebert-Johnson, Ursula and Kim, Michael P. and Reingold, Omer and Rothblum, Guy},
booktitle = {Proceedings of the 35th International Conference on Machine Learning},
pages = {1939--1948},
year = {2018},
editor = {Jennifer Dy and Andreas Krause},
volume = {80},
series = {Proceedings of Machine Learning Research},
address = {Stockholmsmässan, Stockholm Sweden},
publisher = {PMLR}
}
# Multi-Accuracy
@inproceedings{kim2019,
author = {Kim, Michael P. and Ghorbani, Amirata and Zou, James},
title = {Multiaccuracy: Black-Box Post-Processing for Fairness in Classification},
year = {2019},
isbn = {9781450363242},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3306618.3314287},
doi = {10.1145/3306618.3314287},
booktitle = {Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society},
pages = {247--254},
location = {Honolulu, HI, USA},
series = {AIES '19}
}
Owner
- Name: mlr-org
- Login: mlr-org
- Kind: organization
- Location: Munich, Germany
- Website: https://mlr-org.com
- Repositories: 80
- Profile: https://github.com/mlr-org
JOSS Publication
mcboost: Multi-Calibration Boosting for R
Authors
Princeton University
UC Berkeley
Tags
Multi-Calibration Multi-Accuracy Boosting Post-Processing Fair MLCodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"identifier": "mcboost",
"description": "Implements 'Multi-Calibration Boosting' (2018) <https://proceedings.mlr.press/v80/hebert-johnson18a.html> and 'Multi-Accuracy Boosting' (2019) <arXiv:1805.12317> for the multi-calibration of a machine learning model's prediction. 'MCBoost' updates predictions for sub-groups in an iterative fashion in order to mitigate biases like poor calibration or large accuracy differences across subgroups. Multi-Calibration works best in scenarios where the underlying data & labels are unbiased, but resulting models are. This is often the case, e.g. when an algorithm fits a majority population while ignoring or under-fitting minority populations.",
"name": "mcboost: Multi-Calibration Boosting",
"codeRepository": "https://github.com/mlr-org/mcboost",
"issueTracker": "https://github.com/mlr-org/mcboost/issues",
"license": "https://spdx.org/licenses/LGPL-3.0",
"version": "0.4.0",
"programmingLanguage": {
"@type": "ComputerLanguage",
"name": "R",
"url": "https://r-project.org"
},
"runtimePlatform": "R version 4.1.2 (2021-11-01)",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"author": [
{
"@type": "Person",
"givenName": "Florian",
"familyName": "Pfisterer",
"email": "pfistererf@googlemail.com",
"@id": "https://orcid.org/0000-0001-8867-762X"
}
],
"contributor": [
{
"@type": "Person",
"givenName": "Susanne",
"familyName": "Dandl",
"email": "susanne.dandl@stat.uni-muenchen.de",
"@id": "https://orcid.org/0000-0003-4324-4163"
},
{
"@type": "Person",
"givenName": "Christoph",
"familyName": "Kern",
"email": "c.kern@uni-mannheim.de",
"@id": "https://orcid.org/0000-0001-7363-4299"
},
{
"@type": "Person",
"givenName": "Carolin",
"familyName": "Becker"
},
{
"@type": "Person",
"givenName": "Bernd",
"familyName": "Bischl",
"email": "bernd_bischl@gmx.net",
"@id": "https://orcid.org/0000-0001-6002-6980"
}
],
"maintainer": [
{
"@type": "Person",
"givenName": "Florian",
"familyName": "Pfisterer",
"email": "pfistererf@googlemail.com",
"@id": "https://orcid.org/0000-0001-8867-762X"
}
],
"softwareSuggestions": [
{
"@type": "SoftwareApplication",
"identifier": "curl",
"name": "curl",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=curl"
},
{
"@type": "SoftwareApplication",
"identifier": "lgr",
"name": "lgr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=lgr"
},
{
"@type": "SoftwareApplication",
"identifier": "formattable",
"name": "formattable",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=formattable"
},
{
"@type": "SoftwareApplication",
"identifier": "tidyverse",
"name": "tidyverse",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=tidyverse"
},
{
"@type": "SoftwareApplication",
"identifier": "PracTools",
"name": "PracTools",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=PracTools"
},
{
"@type": "SoftwareApplication",
"identifier": "mlr3learners",
"name": "mlr3learners",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=mlr3learners"
},
{
"@type": "SoftwareApplication",
"identifier": "mlr3oml",
"name": "mlr3oml",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=mlr3oml"
},
{
"@type": "SoftwareApplication",
"identifier": "neuralnet",
"name": "neuralnet",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=neuralnet"
},
{
"@type": "SoftwareApplication",
"identifier": "paradox",
"name": "paradox",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=paradox"
},
{
"@type": "SoftwareApplication",
"identifier": "testthat",
"name": "testthat",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=testthat"
},
{
"@type": "SoftwareApplication",
"identifier": "knitr",
"name": "knitr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=knitr"
},
{
"@type": "SoftwareApplication",
"identifier": "ranger",
"name": "ranger",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=ranger"
},
{
"@type": "SoftwareApplication",
"identifier": "rmarkdown",
"name": "rmarkdown",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rmarkdown"
},
{
"@type": "SoftwareApplication",
"identifier": "survival",
"name": "survival",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=survival"
},
{
"@type": "SoftwareApplication",
"identifier": "xgboost",
"name": "xgboost",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=xgboost"
},
{
"@type": "SoftwareApplication",
"identifier": "covr",
"name": "covr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=covr"
}
],
"softwareRequirements": {
"1": {
"@type": "SoftwareApplication",
"identifier": "R",
"name": "R",
"version": ">= 3.1.0"
},
"2": {
"@type": "SoftwareApplication",
"identifier": "backports",
"name": "backports",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=backports"
},
"3": {
"@type": "SoftwareApplication",
"identifier": "checkmate",
"name": "checkmate",
"version": ">= 2.0.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=checkmate"
},
"4": {
"@type": "SoftwareApplication",
"identifier": "data.table",
"name": "data.table",
"version": ">= 1.13.6",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=data.table"
},
"5": {
"@type": "SoftwareApplication",
"identifier": "mlr3",
"name": "mlr3",
"version": ">= 0.10",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=mlr3"
},
"6": {
"@type": "SoftwareApplication",
"identifier": "mlr3misc",
"name": "mlr3misc",
"version": ">= 0.8.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=mlr3misc"
},
"7": {
"@type": "SoftwareApplication",
"identifier": "mlr3pipelines",
"name": "mlr3pipelines",
"version": ">= 0.3.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=mlr3pipelines"
},
"8": {
"@type": "SoftwareApplication",
"identifier": "mlr3proba",
"name": "mlr3proba",
"version": ">= 0.4.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=mlr3proba"
},
"9": {
"@type": "SoftwareApplication",
"identifier": "R6",
"name": "R6",
"version": ">= 2.4.1",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=R6"
},
"10": {
"@type": "SoftwareApplication",
"identifier": "rpart",
"name": "rpart",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rpart"
},
"11": {
"@type": "SoftwareApplication",
"identifier": "glmnet",
"name": "glmnet",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=glmnet"
},
"SystemRequirements": null
},
"fileSize": "272.663KB",
"citation": [
{
"@type": "ScholarlyArticle",
"datePublished": "2021",
"author": [
{
"@type": "Person",
"givenName": "Florian",
"familyName": "Pfisterer"
},
{
"@type": "Person",
"givenName": "Christoph",
"familyName": "Kern"
},
{
"@type": "Person",
"givenName": "Susanne",
"familyName": "Dandl"
},
{
"@type": "Person",
"givenName": "Matthew",
"familyName": "Sun"
},
{
"@type": "Person",
"givenName": "Michael P.",
"familyName": "Kim"
},
{
"@type": "Person",
"givenName": "Bernd",
"familyName": "Bischl"
}
],
"name": "mcboost: Multi-Calibration Boosting for R",
"identifier": "10.21105/joss.03453",
"url": "https://joss.theoj.org/papers/10.21105/joss.03453",
"pagination": "3453",
"@id": "https://doi.org/10.21105/joss.03453",
"sameAs": "https://doi.org/10.21105/joss.03453",
"isPartOf": {
"@type": "PublicationIssue",
"issueNumber": "64",
"datePublished": "2021",
"isPartOf": {
"@type": [
"PublicationVolume",
"Periodical"
],
"volumeNumber": "6",
"name": "Journal of Open Source Software"
}
}
}
],
"releaseNotes": "https://github.com/mlr-org/mcboost/blob/master/NEWS.md",
"readme": "https://github.com/mlr-org/mcboost/blob/main/README.md",
"contIntegration": "https://github.com/mlr-org/mcboost/actions",
"developmentStatus": "https://lifecycle.r-lib.org/articles/stages.html#experimental",
"keywords": [
"machine-learning",
"classification",
"fairness",
"fairness-ml",
"fairness-ai",
"responsible-ai",
"bias-correction",
"bias-detection",
"post-processing",
"ethics"
],
"relatedLink": "https://CRAN.R-project.org/package=mcboost"
}
GitHub Events
Total
- Issues event: 1
- Watch event: 2
- Issue comment event: 2
- Push event: 1
- Create event: 1
Last Year
- Issues event: 1
- Watch event: 2
- Issue comment event: 2
- Push event: 1
- Create event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| pfistfl | p****f@g****m | 155 |
| Carolin | c****r@c****e | 56 |
| chkern | c****n@u****e | 50 |
| susanne-207 | d****e@g****m | 27 |
| Sebastian Fischer | s****r@g****m | 5 |
| mikekimbackward | m****d@g****m | 3 |
| Owen Ward | 3****d | 1 |
| Matthew Sun | m****8@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 38
- Total pull requests: 8
- Average time to close issues: 12 days
- Average time to close pull requests: about 2 months
- Total issue authors: 7
- Total pull request authors: 6
- Average comments per issue: 1.0
- Average comments per pull request: 0.5
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- pfistfl (29)
- chkern (3)
- dandls (2)
- flinder (1)
- be-marc (1)
- carobec (1)
- unai-fa (1)
Pull Request Authors
- mb706 (2)
- sebffischer (2)
- dandls (2)
- carobec (2)
- OwenWard (1)
- chkern (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 328 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 7
- Total maintainers: 1
cran.r-project.org: mcboost
Multi-Calibration Boosting
- Homepage: https://github.com/mlr-org/mcboost
- Documentation: http://cran.r-project.org/web/packages/mcboost/mcboost.pdf
- License: LGPL (≥ 3)
-
Latest release: 0.4.4
published 5 months ago
Rankings
Maintainers (1)
Dependencies
- R >= 3.1.0 depends
- R6 >= 2.4.1 imports
- backports * imports
- checkmate >= 2.0.0 imports
- data.table >= 1.13.6 imports
- glmnet * imports
- mlr3 >= 0.10 imports
- mlr3misc >= 0.8.0 imports
- mlr3pipelines >= 0.3.0 imports
- rmarkdown * imports
- rpart * imports
- PracTools * suggests
- covr * suggests
- curl * suggests
- formattable * suggests
- knitr * suggests
- lgr * suggests
- mlr3learners * suggests
- mlr3oml * suggests
- neuralnet * suggests
- paradox * suggests
- ranger * suggests
- testthat >= 3.1.0 suggests
- tidyverse * suggests
- xgboost * suggests
- actions/checkout v2.3.4 composite
- actions/upload-artifact v2.2.1 composite
- pat-s/always-upload-cache v2.1.3 composite
- r-lib/actions/setup-pandoc master composite
- r-lib/actions/setup-r master composite
- r-lib/actions/setup-tinytex master composite
