lightsnip
Hard fork of curso-r/treesnip specifically for CCAO LightGBM regressions
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (19.7%) to scientific vocabulary
Keywords
lightgbm
machine-learning
r
r-package
Last synced: 6 months ago
·
JSON representation
·
Repository
Hard fork of curso-r/treesnip specifically for CCAO LightGBM regressions
Basic Info
- Host: GitHub
- Owner: ccao-data
- License: agpl-3.0
- Language: R
- Default Branch: master
- Homepage: https://ccao-data.github.io/lightsnip/
- Size: 21.9 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 3
- Releases: 1
Topics
lightgbm
machine-learning
r
r-package
Created over 2 years ago
· Last pushed 12 months ago
Metadata Files
Readme
License
Citation
Codeowners
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# Lightsnip
[](https://github.com/ccao-data/lightsnip/actions/workflows/R-CMD-check.yaml)
[](https://github.com/ccao-data/lightsnip/actions/workflows/test-coverage.yaml)
[](https://github.com/ccao-data/lightsnip/actions/workflows/pre-commit.yaml)
[](https://codecov.io/gh/ccao-data/lightsnip)
Lightsnip is a hard fork of [curso-r/treesnip](https://github.com/curso-r/treesnip). It adds LightGBM bindings for parsnip and enables more advanced LightGBM features, such as early stopping. It is not intended for general use, only as a dependency for CCAO regression models.
For detailed documentation on included functions, [**visit the full reference list**](https://ccao-data.github.io/lightsnip/reference/index.html).
## Installation
You can install the released version of `lightsnip` directly from GitHub with one of the following commands:
```{r, eval=FALSE}
# Using remotes
remotes::install_github("ccao-data/lightsnip")
# Using renv
renv::install("ccao-data/lightsnip")
# Using pak
pak::pak("ccao-data/lightsnip")
# Append the @ symbol for a specific version
remotes::install_github("ccao-data/lightsnip@0.0.5")
```
Once it is installed, you can use it just like any other package. Simply call `library(assessr)` at the beginning of your script.
## Differences compared to [treesnip](https://github.com/curso-r/treesnip)
- Removed support for `tree` and `catboost` (LightGBM only)
- Removed classification support for LightGBM (regression only)
- Removed treesnip caps and warnings on `max_depth`, other parameters
- Removed vignettes and samples
- Remap parameters to engine args instead of parsnip model args
- Added LightGBM-specific hyperparameter functions
- Added LightGBM-specific save/load helpers
- Added recipe/fit cleaning helpers
- Force user to specify categorical columns by name, does _not_ implicitly convert factors to categoricals
- Added early stopping from xgboost
- Added more unit tests
- Fixed a number of bugs
## Basic usage with Tidymodels
Here is a quick example using `lightsnip` with a Tidymodels cross-validation workflow:
```{r message=FALSE, results='asis'}
library(dplyr)
library(lightgbm)
library(lightsnip)
library(parsnip)
library(recipes)
library(workflows)
# Create a dataset for training
mtcars_train <- mtcars %>%
dplyr::slice(1:28) %>%
sample_n(size = 500, replace = TRUE) %>%
mutate(cyl = as.factor(cyl), vs = as.factor(vs))
# Create a test set
mtcars_test <- mtcars %>%
dplyr::slice(29:32) %>%
mutate(cyl = as.factor(cyl), vs = as.factor(vs))
# Recipe to convert factors to categorical integers
rec <- recipe(mpg ~ ., mtcars_train) %>%
step_integer(all_nominal(), zero_based = TRUE)
# Split data into V-folds
resamples <- rsample::vfold_cv(mtcars_train, v = 2)
# Create a model specification. LightGBM-specific parameters are passed to
# set_engine, NOT to boost_tree
model <- parsnip::boost_tree(
trees = tune::tune()
) %>%
parsnip::set_engine(
engine = "lightgbm",
verbose = -1,
learning_rate = tune::tune(),
min_gain_to_split = tune::tune(),
feature_fraction = tune::tune(),
min_data_in_leaf = tune::tune(),
max_depth = tune::tune()
)
# Run grid search
search <- tune::tune_grid(
parsnip::set_mode(model, "regression"),
preprocessor = rec,
resamples = resamples,
param_info = model %>%
hardhat::extract_parameter_set_dials() %>%
stats::update(
learning_rate = learning_rate(),
min_gain_to_split = min_gain_to_split(),
feature_fraction = feature_fraction(),
min_data_in_leaf = min_data_in_leaf(c(1L, 2L)),
max_depth = max_depth(c(3L, 6L))
),
grid = 2,
metrics = yardstick::metric_set(yardstick::rmse)
)
# Finalize model
final <- model %>%
tune::finalize_model(tune::select_best(search)) %>%
parsnip::set_mode("regression") %>%
parsnip::fit(mpg ~ ., bake(prep(rec), mtcars_train))
# Predict on test set
mtcars_test %>%
mutate(pred_mpg = predict(final, bake(prep(rec), .))$.pred) %>%
select(actual_mpg = mpg, pred_mpg) %>%
knitr::kable(digits = 2)
```
Owner
- Name: Cook County Assessor's Office
- Login: ccao-data
- Kind: organization
- Email: assessor.data@cookcountyil.gov
- Website: https://www.cookcountyassessor.com
- Twitter: AssessorCook
- Repositories: 1
- Profile: https://github.com/ccao-data
Citation (CITATION.cff)
message: "If you use this software, please cite it as below." authors: - family-names: "Cook County Assessor's Office" title: "Lightsnip" version: 0.0.6 date-released: 2022-01-24 url: "https://github.com/ccao-data/lightsnip"
GitHub Events
Total
- Delete event: 2
- Issue comment event: 1
- Push event: 9
- Pull request review event: 1
- Pull request event: 6
- Create event: 3
Last Year
- Delete event: 2
- Issue comment event: 1
- Push event: 9
- Pull request review event: 1
- Pull request event: 6
- Create event: 3
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 5
- Total pull requests: 14
- Average time to close issues: 9 days
- Average time to close pull requests: 1 day
- Total issue authors: 1
- Total pull request authors: 4
- Average comments per issue: 0.4
- Average comments per pull request: 1.07
- Merged pull requests: 12
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 7
- Average time to close issues: N/A
- Average time to close pull requests: 3 days
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.57
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- dfsnow (4)
Pull Request Authors
- jeancochrane (14)
- dfsnow (5)
- wagnerlmichael (2)
Top Labels
Issue Labels
enhancement (2)