https://github.com/certara/mlcov
R package for selection of covariate effects using ML
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Last synced: 9 months ago
·
JSON representation
Repository
R package for selection of covariate effects using ML
Basic Info
- Host: GitHub
- Owner: certara
- License: gpl-3.0
- Language: R
- Default Branch: main
- Size: 403 KB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 0
Created over 2 years ago
· Last pushed over 1 year ago
Metadata Files
Readme
License
README.Rmd
---
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE)
```
# mlcov
R package for selection of covariate effects using ML.
The methodology implemented in the `mlcov` R package consists of 4 key steps:
1.) The dataset, comprised of empirical Bayesian estimates of individual parameters (EBEs) and covariate sets, is randomly split into five folds (step 1, data splitting).
2.) The covariate selection (step 2) is performed by applying the Lasso algorithm to reduce irrelevant or redundant covariates due to correlation followed by the Boruta algorithm to iteratively identify relevant covariates based on their importance scores.
3.) A voting mechanism (step 3) across folds determines the final selected covariates based on their robustness. Note that these first three steps are implemented by a simple call to the function `ml_cov_search`.
4.) Finally, residual plots (step 4) are employed to evaluate the covariate-parameter relationships.Following the covariate selection using the proposed ML method, an XGboost model is trained on the selected covariates and the remaining trends between residuals (difference between the actual target values and the model's predicted values) and unselected covariates are examined. The primary goal is to ensure that the ML method did not overlook any significant trends or relationships that could be captured by additional covariates. This step is implemented in a separate function `generate_residual_plots`.
Visit the [PAGE Abstract](https://www.page-meeting.org/?abstract=10996) to learn more.
## Installation
```{r, eval = FALSE}
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("certara/mlcov")
```
# Usage
Import data file:
```{r, message=FALSE}
library(mlcov)
data_file <- system.file(package = "mlcov", "supplementary", "tab33")
data <- read.table(data_file, skip = 1, header = TRUE)
```
Perform covariate search:
```{r}
result <- ml_cov_search(
data = data,
pop_param = c("V1","CL"),
cov_continuous = c("AGE","WT","HT","BMI","ALB","CRT",
"FER","CHOL","WBC","LYPCT","RBC",
"HGB","HCT","PLT"),
cov_factors = c("SEX","RACE","DIAB","ALQ","WACT","SMQ")
)
print(result)
```
Generate SHAP plots:
```{r}
generate_shap_summary_plot(
result,
x_bound = NULL,
dilute = FALSE,
scientific = FALSE,
my_format = NULL,
title = NULL,
title.position = 0.5,
ylab = NULL,
xlab = NULL
)
```
Generate residual plots:
```{r}
generate_residuals_plot(data = data, result, pop_param = 'CL')
```
```{r}
generate_residuals_plot(data = data, result, pop_param = 'V1')
```
Owner
- Name: Certara USA, Inc.
- Login: certara
- Kind: organization
- Email: github-admins@certara.com
- Website: https://www.certara.com/
- Repositories: 8
- Profile: https://github.com/certara
GitHub Events
Total
- Watch event: 2
- Fork event: 1
Last Year
- Watch event: 2
- Fork event: 1
Dependencies
.github/workflows/check-standard.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 3.5.0 depends
- SHAPforxgboost * depends
- caret * depends
- Boruta * imports
- Metrics * imports
- dplyr * imports
- ggplot2 * imports
- glmnet * imports
- gridExtra * imports
- xgboost * imports
- knitr * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests
- vdiffr >= 1.0.0 suggests