wbacon

wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression - Published in JOSS (2021)

https://github.com/tobiasschoch/wbacon

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

outlier outlier-detection r-package robust-regression statistics
Last synced: 6 months ago · JSON representation

Repository

Weighted BACON algorithms

Basic Info
  • Host: GitHub
  • Owner: tobiasschoch
  • License: gpl-2.0
  • Language: C
  • Default Branch: master
  • Homepage:
  • Size: 2.74 MB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 1
Topics
outlier outlier-detection r-package robust-regression statistics
Created almost 6 years ago · Last pushed 9 months ago
Metadata Files
Readme Contributing License

README.md

wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression

DOI CRAN downloads downloads

Summary

Billor et al. (2000) proposed the BACON (blocked adaptive computationally-efficient outlier nominators) algorithms for multivariate outlier nomination and robust linear regression. Béguin and Hulliger (2008) extended the outlier detection method to weighted and incomplete data problems. Both methods are implemented in the R statistical software (R Core Team, 2025) in the packages, respectively, robustX (Mächler et al., 2023) and modi (Hulliger, 2023).

Our package offers a computationally efficient implementation in the C language with OpenMP support for parallelization. Efficiency is achieved by using a weighted quantile based on the Quicksort algorithm, partial sorting in place of full sorting, reuse of computed estimates, and most importantly an up-/downdating scheme for the Cholesky and QR factorizations. The computational costs of up-/downdating are far less than re-computing the entire decomposition repeatedly.

The details of the package are discussed in the accompanying paper:

Schoch, T. (2021) wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression, Journal of Open Source Software 6, 3238. DOI 10.21105/joss.03238

Available methods

  • wBACON() is for multivariate outlier nomination and robust estimation of location/ center and covariance matrix
  • wBACON_reg() is for robust linear regression (the method is robust against outliers in the response variable and the model's design matrix)

Assumptions

The BACON algorithms assume that the underlying model is an appropriate description of the non-outlying observations; see Billor et al. (2000). More precisely,

  • the outlier nomination method assumes that the "good" data have (roughly) an elliptically contoured distribution (this includes the Gaussian distribution as a special case);
  • the regression method assumes that the non-outlying ("good") data are described by a linear (homoscedastic) regression model and that the independent variables (having removed the regression intercept/constant, if there is a constant) follow (roughly) an elliptically contoured distribution.

"Although the algorithms will often do something reasonable even when these assumptions are violated, it is hard to say what the results mean." Billor et al. (2000, p. 289)

It is strongly recommended that the structure of the data be examined and whether the assumptions made about the "good" observations are reasonable.

The role of the data analyst

In line with Billor et al. (2000, p. 290), we use the term outlier "nomination" rather than "detection" to highlight that algorithms should not go beyond nominating observations as potential outliers; see also Béguin and Hulliger (2008). It is left to the analyst to finally label outlying observations as such.

The software provides the analyst with tools and measures to study potentially outlying observations. It is strongly recommended to use the tools. See the package folders vignettes and doc for a vignette (guide) and further documentation.

Installation

The package can be installed from CRAN using install.packages("wbacon")

Building

Make sure that the R package devtools is installed. Then, the wbacon package can be pulled from this GitHub repository and installed by devtools::install_github("tobiasschoch/wbacon")

The package contains C code that needs to be compiled.

Community guidelines

Submitting an issue

If you have any suggestions for feature additions or any problems with the software that you would like addressed with the development community, please submit an issue on the Issues tab of the project GitHub repository. You may want to search the existing issues before submitting, to avoid asking a question or requesting a feature that has already been discussed.

How to contribute

If you are interested in modifying the code, you may fork the project for your own use, as detailed in the GNU GPL License we have adopted for the project. In order to contribute, please contact the developer by Tobias Schoch at gmail dot com (the names are separated by a dot) after making the desired changes.

Asking for help

If you have questions about how to use the software, or would like to seek out collaborations related to this project, you may contact Tobias Schoch (see contact details above).

References

Béguin, C., and Hulliger, B. (2008). The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data, Survey Methodology, 34, 91-103.

Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked adaptive computationally-efficient outlier nominators, Computational Statistics and Data Analysis, 34, 279-298. DOI 10.1016/S0167-9473(99)00101-2

Hulliger, B. (2023). modi: Multivariate Outlier Detection and Imputation for Incomplete Survey Data, R package version 0.1-2. URL https://CRAN.R-project.org/package=modi

Mächler, M. and W. A. Stahel (2023). robustX: ’eXtra’ / ’eXperimental’ Functionality for Robust Statistics, R package version 1.2-7. URL https://CRAN.R-project.org/package=robustX

OpenMP Architecture Review Board (2018). OpenMP Application Program Interface Version 5.0. URL https://https://www.openmp.org

R Core Team (2025). R. A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org.

Schoch, T. (2021) wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression, Journal of Open Source Software, 6, 3238. DOI 10.21105/joss.03238

Owner

  • Name: Tobias Schoch
  • Login: tobiasschoch
  • Kind: user
  • Location: Switzerland
  • Company: University of Applied Sciences Northwestern Switzerland

Professor of Statistics, University of Applied Sciences Northwestern Switzerland

JOSS Publication

wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression
Published
June 06, 2021
Volume 6, Issue 62, Page 3238
Authors
Tobias Schoch ORCID
University of Applied Sciences and Arts Northwestern Switzerland, School of Business, Riggenbachstrasse 16, CH-4600 Olten, Switzerland
Editor
Frederick Boehm ORCID
Tags
outlier detection robustness survey linear regression bounded influence

GitHub Events

Total
  • Issues event: 2
  • Watch event: 1
  • Issue comment event: 1
  • Push event: 1
  • Fork event: 1
  • Create event: 1
Last Year
  • Issues event: 2
  • Watch event: 1
  • Issue comment event: 1
  • Push event: 1
  • Fork event: 1
  • Create event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 111
  • Total Committers: 1
  • Avg Commits per committer: 111.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Tobias Schoch t****h@g****m 111

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: 10 days
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: 17 days
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • AhmedThahir (1)
  • Beliavsky (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 579 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 1
  • Total versions: 7
  • Total maintainers: 1
cran.r-project.org: wbacon

Weighted BACON Algorithms

  • Versions: 7
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 579 Last month
Rankings
Dependent packages count: 18.1%
Forks count: 21.0%
Dependent repos count: 24.0%
Average: 30.4%
Stargazers count: 30.9%
Downloads: 57.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • grDevices * imports
  • graphics * imports
  • hexbin * imports
  • stats * imports
  • cellWise * suggests
  • knitr * suggests
  • modi * suggests
  • rmarkdown * suggests
  • robustX >= 1.2 suggests
  • robustbase * suggests
.github/workflows/draft-pdf.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
inst/varia/Dockerfile docker
  • r-base latest build