collinear
R package to manage multicollinearity in modeling data frames.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.4%) to scientific vocabulary
Keywords
machine-learning
multicollinearity
r-package
statistics
Last synced: 10 months ago
·
JSON representation
Repository
R package to manage multicollinearity in modeling data frames.
Basic Info
- Host: GitHub
- Owner: BlasBenito
- License: other
- Language: R
- Default Branch: main
- Homepage: https://blasbenito.github.io/collinear/
- Size: 20.2 MB
Statistics
- Stars: 12
- Watchers: 1
- Forks: 1
- Open Issues: 3
- Releases: 3
Topics
machine-learning
multicollinearity
r-package
statistics
Created almost 3 years ago
· Last pushed 10 months ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
eval = TRUE,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
# options(tibble.print_min = 5, tibble.print_max = 5)
```
# `collinear` \n Seamless Multicollinearity Management
[](https://doi.org/10.5281/zenodo.10039489)
[](https://cran.r-project.org/package=collinear)
[](https://CRAN.R-project.org/package=collinear)
[](https://github.com/BlasBenito/collinear/actions/workflows/R-CMD-check.yaml)
## Warning
Version 2.0.0 of `collinear` includes changes that may disrupt existing workflows, and results from previous versions may not be reproducible due to enhancements in the automated selection algorithms. Please refer to the Changelog for details.
## Summary
[Multicollinearity hinders the interpretability](https://www.blasbenito.com/post/multicollinearity-model-interpretability/) of linear and machine learning models.
The `collinear` package combines four methods for easy management of multicollinearity in modelling data frames with numeric and categorical variables:
- **Target Encoding**: Transforms categorical predictors to numeric using a numeric response as reference.
- **Preference Order**: Ranks predictors by their association with a response variable to preserve important ones in multicollinearity filtering.
- **Pairwise Correlation Filtering**: Automated multicollinearity filtering of numeric and categorical predictors based on pairwise correlations.
- **Variance Inflation Factor Filtering**: Automated multicollinearity filtering of numeric predictors based on Variance Inflation Factors.
These methods are combined in the function `collinear()`, which serves as single entry point for most of the functionalities in the package. The article [How It Works](https://blasbenito.github.io/collinear/articles/how_it_works.html) explains how `collinear()` works in detail.
## Citation
If you find this package useful, please cite it as:
*Blas M. Benito (2024). collinear: R Package for Seamless Multicollinearity Management. Version 2.0.0. doi: 10.5281/zenodo.10039489*
## Main Improvements in Version 2.0.0
1. **Expanded Functionality**: Functions `collinear()` and `preference_order()` support both categorical and numeric responses and predictors, and can handle several responses at once.
2. **Robust Selection Algorithms**: Enhanced selection in `vif_select()` and `cor_select()`.
3. **Enhanced Functionality to Rank Predictors**: New functions to compute association between response and predictors covering most use-cases, and automated function selection depending on data features.
4. **Simplified Target Encoding**: Streamlined and parallelized for better efficiency, and new default is "loo" (leave-one-out).
5. **Parallelization and Progress Bars**: Utilizes `future` and `progressr` for enhanced performance and user experience.
## Install
The package `collinear` can be installed from CRAN.
```{r, eval = FALSE}
install.packages("collinear")
```
The development version can be installed from GitHub.
```{r, eval = FALSE}
remotes::install_github(
repo = "blasbenito/collinear",
ref = "development"
)
```
Previous versions are in the “archive_xxx” branches of the GitHub repository.
```{r, eval = FALSE}
remotes::install_github(
repo = "blasbenito/collinear",
ref = "archive_v1.1.1"
)
```
```{r packages, message = FALSE, warning = FALSE, include = FALSE}
library(collinear)
library(future)
library(parallelly)
```
## Getting Started
The function `collinear()` provides all tools required for a fully fledged multicollinearity filtering workflow. The code below shows a small example workflow.
```{r}
#parallelization setup
future::plan(
future::multisession,
workers = parallelly::availableCores() - 1
)
#progress bar (does not work in Rmarkdown)
#progressr::handlers(global = TRUE)
#example data frame
df <- collinear::vi[1:5000, ]
#there are many NA cases in this data frame
sum(is.na(df))
```
```{r}
#numeric and categorical predictors
predictors <- collinear::vi_predictors
collinear::identify_predictors(
df = df,
predictors = predictors
)
```
```{r}
#multicollinearity filtering
selection <- collinear::collinear(
df = df,
response = c(
"vi_numeric", #numeric response
"vi_categorical" #categorical response
),
predictors = predictors,
max_cor = 0.75,
max_vif = 5,
quiet = TRUE
)
```
The output is a named list of vectors with selected predictor names when more than one response is provided, and a character vector otherwise.
```{r}
selection
```
The output of `collinear()` can be easily converted into model formulas.
```{r}
formulas <- collinear::model_formula(
predictors = selection
)
formulas
```
These formulas can be used to fit models right away.
```{r, eval = FALSE}
#linear model
m_vi_numeric <- stats::glm(
formula = formulas[["vi_numeric"]],
data = df,
na.action = na.omit
)
#random forest model
m_vi_categorical <- ranger::ranger(
formula = formulas[["vi_categorical"]],
data = na.omit(df)
)
```
## Getting help
If you encounter bugs or issues with the documentation, please [file a issue on GitHub](https://github.com/BlasBenito/collinear/issues).
Owner
- Name: Blas Benito
- Login: BlasBenito
- Kind: user
- Location: Somewhere in the beach
- Website: www.blasbenito.com
- Twitter: blasbenito
- Repositories: 5
- Profile: https://github.com/BlasBenito
PhD in Quantitative Ecology, Master in Geographic Information Systems, R developer, data scientist and data engineer at @BiomeMakers.
GitHub Events
Total
- Create event: 1
- Release event: 1
- Issues event: 4
- Watch event: 8
- Issue comment event: 3
- Push event: 76
- Pull request event: 4
Last Year
- Create event: 1
- Release event: 1
- Issues event: 4
- Watch event: 8
- Issue comment event: 3
- Push event: 76
- Pull request event: 4
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 5
- Total pull requests: 14
- Average time to close issues: 27 days
- Average time to close pull requests: about 2 months
- Total issue authors: 4
- Total pull request authors: 2
- Average comments per issue: 1.4
- Average comments per pull request: 0.36
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 8
- Average time to close issues: 22 days
- Average time to close pull requests: 19 days
- Issue authors: 3
- Pull request authors: 1
- Average comments per issue: 1.75
- Average comments per pull request: 0.63
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- AMBarbosa (2)
- BlasBenito (1)
- marineReg (1)
- RockEco (1)
Pull Request Authors
- AMBarbosa (10)
- olivroy (4)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 486 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 4
- Total maintainers: 1
cran.r-project.org: collinear
Automated Multicollinearity Management
- Homepage: https://blasbenito.github.io/collinear/
- Documentation: http://cran.r-project.org/web/packages/collinear/collinear.pdf
- License: MIT + file LICENSE
-
Latest release: 2.0.0
published over 1 year ago
Rankings
Forks count: 28.0%
Dependent packages count: 29.0%
Stargazers count: 34.7%
Dependent repos count: 37.1%
Average: 43.1%
Downloads: 86.8%
Maintainers (1)
Last synced:
10 months ago
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action v4.4.1 composite
- actions/checkout v3 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 4.0 depends
- dplyr * imports
- rlang * imports
- tibble * imports
- tidyr * imports
- roxyglobals * suggests
- testthat >= 3.0.0 suggests