lightsnip

Hard fork of curso-r/treesnip specifically for CCAO LightGBM regressions

https://github.com/ccao-data/lightsnip

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (19.7%) to scientific vocabulary

Keywords

lightgbm machine-learning r r-package
Last synced: 6 months ago · JSON representation ·

Repository

Hard fork of curso-r/treesnip specifically for CCAO LightGBM regressions

Basic Info
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 3
  • Releases: 1
Topics
lightgbm machine-learning r r-package
Created over 2 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation Codeowners

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# Lightsnip 

[![R-CMD-check](https://github.com/ccao-data/lightsnip/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ccao-data/lightsnip/actions/workflows/R-CMD-check.yaml)
[![test-coverage](https://github.com/ccao-data/lightsnip/actions/workflows/test-coverage.yaml/badge.svg)](https://github.com/ccao-data/lightsnip/actions/workflows/test-coverage.yaml)
[![pre-commit](https://github.com/ccao-data/lightsnip/actions/workflows/pre-commit.yaml/badge.svg)](https://github.com/ccao-data/lightsnip/actions/workflows/pre-commit.yaml)
[![codecov](https://codecov.io/gh/ccao-data/lightsnip/branch/master/graph/badge.svg)](https://codecov.io/gh/ccao-data/lightsnip)

Lightsnip is a hard fork of [curso-r/treesnip](https://github.com/curso-r/treesnip). It adds LightGBM bindings for parsnip and enables more advanced LightGBM features, such as early stopping. It is not intended for general use, only as a dependency for CCAO regression models.

For detailed documentation on included functions, [**visit the full reference list**](https://ccao-data.github.io/lightsnip/reference/index.html).

## Installation

You can install the released version of `lightsnip` directly from GitHub with one of the following commands:

```{r, eval=FALSE}
# Using remotes
remotes::install_github("ccao-data/lightsnip")

# Using renv
renv::install("ccao-data/lightsnip")

# Using pak
pak::pak("ccao-data/lightsnip")

# Append the @ symbol for a specific version
remotes::install_github("ccao-data/lightsnip@0.0.5")
```

Once it is installed, you can use it just like any other package. Simply call `library(assessr)` at the beginning of your script.

## Differences compared to [treesnip](https://github.com/curso-r/treesnip)

- Removed support for `tree` and `catboost` (LightGBM only)
- Removed classification support for LightGBM (regression only)
- Removed treesnip caps and warnings on `max_depth`, other parameters
- Removed vignettes and samples
- Remap parameters to engine args instead of parsnip model args
- Added LightGBM-specific hyperparameter functions
- Added LightGBM-specific save/load helpers
- Added recipe/fit cleaning helpers
- Force user to specify categorical columns by name, does _not_ implicitly convert factors to categoricals
- Added early stopping from xgboost
- Added more unit tests
- Fixed a number of bugs

## Basic usage with Tidymodels

Here is a quick example using `lightsnip` with a Tidymodels cross-validation workflow: 

```{r message=FALSE, results='asis'}
library(dplyr)
library(lightgbm)
library(lightsnip)
library(parsnip)
library(recipes)
library(workflows)

# Create a dataset for training
mtcars_train <- mtcars %>%
  dplyr::slice(1:28) %>%
  sample_n(size = 500, replace = TRUE) %>%
  mutate(cyl = as.factor(cyl), vs = as.factor(vs))

# Create a test set
mtcars_test <- mtcars %>%
  dplyr::slice(29:32) %>%
  mutate(cyl = as.factor(cyl), vs = as.factor(vs))

# Recipe to convert factors to categorical integers
rec <- recipe(mpg ~ ., mtcars_train) %>%
  step_integer(all_nominal(), zero_based = TRUE)

# Split data into V-folds
resamples <- rsample::vfold_cv(mtcars_train, v = 2)

# Create a model specification. LightGBM-specific parameters are passed to
# set_engine, NOT to boost_tree
model <- parsnip::boost_tree(
  trees = tune::tune()
) %>%
  parsnip::set_engine(
    engine = "lightgbm",
    verbose = -1,
    learning_rate = tune::tune(),
    min_gain_to_split = tune::tune(),
    feature_fraction = tune::tune(),
    min_data_in_leaf = tune::tune(),
    max_depth = tune::tune()
  )

# Run grid search
search <- tune::tune_grid(
  parsnip::set_mode(model, "regression"),
  preprocessor = rec,
  resamples = resamples,
  param_info = model %>%
    hardhat::extract_parameter_set_dials() %>%
    stats::update(
      learning_rate = learning_rate(),
      min_gain_to_split = min_gain_to_split(),
      feature_fraction = feature_fraction(),
      min_data_in_leaf = min_data_in_leaf(c(1L, 2L)),
      max_depth = max_depth(c(3L, 6L))
    ),
  grid = 2,
  metrics = yardstick::metric_set(yardstick::rmse)
)

# Finalize model
final <- model %>%
  tune::finalize_model(tune::select_best(search)) %>%
  parsnip::set_mode("regression") %>%
  parsnip::fit(mpg ~ ., bake(prep(rec), mtcars_train))

# Predict on test set
mtcars_test %>%
  mutate(pred_mpg = predict(final, bake(prep(rec), .))$.pred) %>%
  select(actual_mpg = mpg, pred_mpg) %>%
  knitr::kable(digits = 2)
```

Owner

  • Name: Cook County Assessor's Office
  • Login: ccao-data
  • Kind: organization
  • Email: assessor.data@cookcountyil.gov

Citation (CITATION.cff)

message: "If you use this software, please cite it as below."
authors:
- family-names: "Cook County Assessor's Office"
title: "Lightsnip"
version: 0.0.6
date-released: 2022-01-24
url: "https://github.com/ccao-data/lightsnip"

GitHub Events

Total
  • Delete event: 2
  • Issue comment event: 1
  • Push event: 9
  • Pull request review event: 1
  • Pull request event: 6
  • Create event: 3
Last Year
  • Delete event: 2
  • Issue comment event: 1
  • Push event: 9
  • Pull request review event: 1
  • Pull request event: 6
  • Create event: 3

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 5
  • Total pull requests: 14
  • Average time to close issues: 9 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 1
  • Total pull request authors: 4
  • Average comments per issue: 0.4
  • Average comments per pull request: 1.07
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 7
  • Average time to close issues: N/A
  • Average time to close pull requests: 3 days
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.57
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dfsnow (4)
Pull Request Authors
  • jeancochrane (14)
  • dfsnow (5)
  • wagnerlmichael (2)
Top Labels
Issue Labels
enhancement (2)
Pull Request Labels