UBayFS

UBayFS: An R Package for User Guided Feature Selection - Published in JOSS (2023)

https://github.com/annajenul/ubayfs

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 15 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: springer.com, joss.theoj.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

bayesian-statistics ensemble-models feature-selection r user-knowledge
Last synced: 6 months ago · JSON representation

Repository

UBayFS implements the UBayFS feature selection framework, together with an interactive Shiny dashbord.

Basic Info
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Topics
bayesian-statistics ensemble-models feature-selection r user-knowledge
Created over 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Code of conduct

README.md

R-CMD-check <!-- badges: end -->

UBayFS

The UBayFS package implements the framework proposed in the article Jenul et al. (2022), together with an interactive Shiny dashboard, which makes UBayFS applicable to R-users with different levels of expertise. UBayFS is an ensemble feature selection technique embedded in a Bayesian statistical framework. The method combines data and user knowledge, where the first is extracted via data-driven ensemble feature selection. The user can control the feature selection by assigning prior weights to features and penalizing specific feature combinations. In particular, the user can define a maximal number of selected features and must-link constraints (features must be selected together) or cannot-link constraints (features must not be selected together). Using relaxed constraints, a parameter $\rho$ regulates the penalty shape. Hence, violation of constraints can be valid but leads to a lower target value of the feature set that is derived from the violated constraints. UBayFS can be used for common feature selection and also for block feature selection.

If you prefer Python, please check out our corresponding Python implementation.

Documentation and Structure

A documentation illustrates how UBayFS can be used for standard feature selection

UBayFS is implemented via a core S3-class 'UBaymodel', along with help functions. An overview of the 'UBaymodel' class and its main generic functions, is shown in the following diagram:

Requirements and Dependencies

  • R (>= 3.5.0)
  • GA
  • matrixStats
  • shiny
  • mRMRe
  • Rdimtools
  • DirichletReg
  • ggplot2
  • gridExtra
  • utils
  • hyper2
  • methods
  • prettydoc

In addition, some functionality of the package (in particular, the interactive Shiny interface) requires the following depedencies:

  • shinyWidgets
  • shinyalert
  • DT
  • RColorBrewer
  • shinyjs
  • shinyBS
  • testthat (>= 3.0.0)
  • rmarkdown
  • dplyr
  • plyr
  • knitr
  • rpart
  • GSelection
  • caret
  • glmnet

Implementation Details

The original paper defines the following utility function $U(\boldsymbol{\delta},\boldsymbol{\theta})$ for optimization with respect to $\boldsymbol{\delta}\in \lbrace 0,1\rbrace ^N$: $$U(\boldsymbol{\delta},\boldsymbol{\theta}) = \boldsymbol{\delta}^T \boldsymbol{\theta}-\lambda \kappa(\boldsymbol{\delta}), $$ for fixed $\lambda>0$.

For practical reasons, the implementation in the UBayFS package uses a modified utility function $\tilde{U}(\boldsymbol{\delta},\boldsymbol{\theta})$ which adds an admissibility term $1-\kappa(\boldsymbol{\delta})$ rather than subtracting an inadmissibility term $\kappa(\boldsymbol{\delta})$ $$\tilde{U}(\boldsymbol{\delta},\boldsymbol{\theta}) = \boldsymbol{\delta}^T \boldsymbol{\theta}+\lambda (1-\kappa(\boldsymbol{\delta})) = \boldsymbol{\delta}^T \boldsymbol{\theta}-\lambda \kappa(\boldsymbol{\delta}) +\lambda.$$

Thus, the function values of $U(\boldsymbol{\delta},\boldsymbol{\theta})$ and $\tilde{U}(\boldsymbol{\delta},\boldsymbol{\theta})$ deviate by a constant $\lambda$; however, the optimal feature set $$\boldsymbol{\delta}^{\star} = \underset{\boldsymbol{\delta}\in\lbrace 0,1\rbrace ^N}{\text{arg max}}~ U(\boldsymbol{\delta},\boldsymbol{\theta}) = \underset{\boldsymbol{\delta}\in\lbrace 0,1\rbrace ^N}{\text{arg max}}~ \tilde{U}(\boldsymbol{\delta},\boldsymbol{\theta})$$ remains unaffected.

Installation

The development version of the package can be installed with:

remotes::install_github("annajenul/UBayFS", build_manual = TRUE, build_vignettes = TRUE)

If you use a macOS operator system, make sure you have XQuartz installed.

To build the vignettes, Pandoc is required. It may happen that Pandoc is missing on your computer, or that the version is too old. Then the installation will return the error

Pandoc is required to build R Markdown vignettes but not available. Please make sure it is installed.

An installation guide for Pandoc on different operation systems is provided here.

Contributing

Your contribution to UBayFS is very welcome!

Contribution to the package requires the agreement of the Contributor Code of Conduct terms.

For the implementation of a new feature or bug-fixing, we encourage you to send a Pull Request to the repository. Please add a detailed and concise description of the invented feature or the bug. In case of fixing a bug, include comments about your solution. To improve UBayFS even more, feel free to send us issues with bugs, you are not sure about. We are thankful for any kind of constructive criticism and suggestions.

Citation

If you use UBayFS in a report or scientific publication, we would appreciate citations to the following papers:

DOI

Jenul, A. and Schrunner, S., (2023). UBayFS: An R Package for User Guided Feature Selection. Journal of Open Source Software, 8(81), 4848, https://doi.org/10.21105/joss.04848

Bibtex entry:

@article{Jenul2023,
  doi = {10.21105/joss.04848},
  url = {https://doi.org/10.21105/joss.04848},
  year = {2023},
  month = jan,
  publisher = {The Open Journal},
  volume = {8},
  number = {81},
  pages = {4848},
  author = {Anna Jenul and Stefan Schrunner},
  title = {{UBayFS}: An R Package for User Guided Feature
        Selection},
  journal = {Journal of Open Source Software}
}

Jenul, A., Schrunner, S. et al. A user-guided Bayesian framework for ensemble feature selection in life science applications (UBayFS). Mach Learn (2022). https://doi.org/10.1007/s10994-022-06221-9

Bibtex entry:

@article{Jenul2022,
  doi = {10.1007/s10994-022-06221-9},
  url = {https://doi.org/10.1007/s10994-022-06221-9},
  year = {2022},
  month = aug,
  publisher = {Springer Science and Business Media {LLC}},
  volume = {111},
  number = {10},
  pages = {3897--3923},
  author = {Anna Jenul and Stefan Schrunner and J\"{u}rgen Pilz and Oliver Tomic},
  title = {A user-guided Bayesian framework for ensemble feature selection in life science applications ({UBayFS})},
  journal = {Machine Learning}

}

Owner

  • Login: annajenul
  • Kind: user
  • Location: Ås, Norway
  • Company: Norwegian University of Life Sciences

PhD Student in Data Science. PhD related software implementations are stored in NMBU/DataScience (https://github.com/NMBU-Data-Science).

JOSS Publication

UBayFS: An R Package for User Guided Feature Selection
Published
January 27, 2023
Volume 8, Issue 81, Page 4848
Authors
Anna Jenul ORCID
Norwegian University of Life Sciences, Ås, Norway
Stefan Schrunner ORCID
Norwegian University of Life Sciences, Ås, Norway
Editor
Øystein Sørensen ORCID
Tags
feature selection

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 334
  • Total Committers: 4
  • Avg Commits per committer: 83.5
  • Development Distribution Score (DDS): 0.329
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
annajenul 6****l 224
sschrunner s****r@f****o 99
sschrunner s****r@p****m 8
Aaron Peikert a****t@p****e 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 20
  • Total pull requests: 10
  • Average time to close issues: 4 days
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 4
  • Total pull request authors: 3
  • Average comments per issue: 1.25
  • Average comments per pull request: 0.4
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • aaronpeikert (10)
  • osorensen (5)
  • dhvalden (4)
  • EugeneHao (1)
Pull Request Authors
  • annajenul (7)
  • aaronpeikert (2)
  • sschrunner (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 150 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
cran.r-project.org: UBayFS

A User-Guided Bayesian Framework for Ensemble Feature Selection

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 150 Last month
Rankings
Stargazers count: 21.1%
Forks count: 21.9%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Average: 36.8%
Downloads: 75.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • DirichletReg * imports
  • GA * imports
  • GSelection * imports
  • Rdimtools * imports
  • caret * imports
  • ggplot2 * imports
  • ggpubr * imports
  • glmnet * imports
  • hyper2 * imports
  • knitr * imports
  • mRMRe * imports
  • matrixStats * imports
  • rpart * imports
  • shiny * imports
  • utils * imports
  • DT * suggests
  • RColorBrewer * suggests
  • dplyr * suggests
  • plyr * suggests
  • prettydoc * suggests
  • rmarkdown * suggests
  • shinyBS * suggests
  • shinyWidgets * suggests
  • shinyalert * suggests
  • shinyjs * suggests
  • shinythemes * suggests
  • tcltk * suggests
  • testthat >= 3.0.0 suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/cache v1 composite
  • actions/checkout v3 composite
  • actions/upload-artifact master composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite