blockCV

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See

https://github.com/rvalavi/blockcv

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.4%) to scientific vocabulary

Keywords

cross-validation r r-package rstats spatial spatial-cross-validation spatial-modelling species-distribution-modelling
Last synced: 6 months ago · JSON representation

Repository

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See

Basic Info
Statistics
  • Stars: 115
  • Watchers: 6
  • Forks: 23
  • Open Issues: 5
  • Releases: 0
Topics
cross-validation r r-package rstats spatial spatial-cross-validation spatial-modelling species-distribution-modelling
Created about 8 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License

README.md

blockCV

R build
status codecov CRAN
version total License DOI Methods in Ecology & Evolution

Spatial and environmental blocking for k-fold and LOO cross-validation

The package blockCV offers a range of functions for generating train and test folds for k-fold and leave-one-out (LOO) cross-validation (CV). It allows for separation of data spatially and environmentally, with various options for block construction. Additionally, it includes a function for assessing the level of spatial autocorrelation in response or raster covariates, to aid in selecting an appropriate distance band for data separation. The blockCV package is suitable for the evaluation of a variety of spatial modelling applications, including classification of remote sensing imagery, soil mapping, and species distribution modelling (SDM). It also provides support for different SDM scenarios, including presence-absence and presence-background species data, rare and common species, and raster data for predictor variables.

Main features

  • There are four blocking methods: spatial, clustering, buffers, and NNDM (Nearest Neighbour Distance Matching) blocks
  • Several ways to construct spatial blocks
  • The assignment of the spatial blocks to cross-validation folds can be done in three different ways: random, systematic and checkerboard pattern
  • The spatial blocks can be assigned to cross-validation folds to have evenly distributed records for binary (e.g. species presence-absence/background) or multi-class responses (e.g. land cover classes for remote sensing image classification)
  • The buffering and NNDM functions can account for presence-absence and presence-background data types
  • Using geostatistical techniques to inform the choice of a suitable distance band by which to separate the data sets

New updates of the version 3.0

The latest major version of blockCV (v3.0) features significant updates and changes. All function names have been revised to more general names, beginning with cv_*. Although the previous functions (version 2.x) will continue to work, they will be removed in future updates after being available for an extended period. It is highly recommended to update your code with the new functions provided below.

Some new updates:

  • Function names have been changed, with all functions now starting with cv_
  • The CV blocking functions are now: cv_spatial, cv_cluster, cv_buffer, and cv_nndm
  • Spatial blocks now support hexagonal (now, default), rectangular, and user-defined blocks
  • A fast C++ implementation of Nearest Neighbour Distance Matching (NNDM) algorithm (Milà et al. 2022) is now added
  • The NNDM algorithm can handle species presence-background data and other types of data
  • The cv_cluster function generates blocks based on kmeans clustering. It now works on both environmental rasters and the spatial coordinates of sample points
  • The cv_spatial_autocor function now calculates the spatial autocorrelation range for both the response (i.e. binary or continuous data) and a set of continuous raster covariates
  • The new cv_plot function allows for visualization of folds from all blocking strategies using ggplot facets
  • The terra package is now used for all raster processing and supports both stars and raster objects, as well as files on disk.
  • The new cv_similarity provides measures on possible extrapolation to testing folds

Installation

To install the latest update of the package from GitHub use:

r remotes::install_github("rvalavi/blockCV", dependencies = TRUE)

Or installing from CRAN:

r install.packages("blockCV", dependencies = TRUE)

Vignettes

To see the practical examples of the package see:

  1. blockCV introduction: how to create block cross-validation folds
  2. Block cross-validation for species distribution modelling
  3. Using blockCV with the caret and tidymodels (see here)

Basic usage

This code snippet showcases some of the package's functionalities, but for more comprehensive tutorials, please refer to the vignette included with the package (and above).

``` r

loading the package

library(blockCV) library(sf) # working with spatial vector data library(terra) # working with spatial raster data ```

``` r

load raster data; the pipe operator |> is available for R v4.1 or higher

myrasters <- system.file("extdata/au/", package = "blockCV") |> list.files(full.names = TRUE) |> terra::rast()

load species presence-absence data and convert to sf

padata <- read.csv(system.file("extdata/", "species.csv", package = "blockCV")) |> sf::stas_sf(coords = c("x", "y"), crs = 7845)

```

``` r

spatial blocking by specified range and random assignment

sb <- cvspatial(x = padata, # sf or SpatialPoints of sample data (e.g. species data) column = "occ", # the response column (binary or multi-class) r = myrasters, # a raster for background (optional) size = 450000, # size of the blocks in metres k = 5, # number of folds hexagon = TRUE, # use hexagonal blocks - defualt selection = "random", # random blocks-to-fold iteration = 100, # to find evenly dispersed folds biomod2 = TRUE) # also create folds for biomod2 ```

Or create spatial clusters for k-fold cross-validation:

``` r

create spatial clusters

set.seed(6) sc <- cvcluster(x = padata, column = "occ", # optionally count data in folds (binary or multi-class) k = 5) ```

``` r

now plot the created folds

cvplot(cv = sc, # a blockCV object x = padata, # sample points r = myrasters[[1]], # optionally add a raster background points_alpha = 0.5, nrow = 2) ```

Investigate spatial autocorrelation in the landscape to choose a suitable size for spatial blocks:

``` r

exploring the effective range of spatial autocorrelation in raster covariates or sample data

cvspatialautocor(r = myrasters, # a SpatRaster object or path to files num_sample = 5000, # number of cells to be used plot = TRUE) ```

Alternatively, you can manually choose the size of spatial blocks in an interactive session using a Shiny app.

``` r

shiny app to aid selecting a size for spatial blocks

cvblocksize(r = myrasters[[1]], x = padata, # optionally add sample points column = "occ", minsize = 2e5, max_size = 9e5) ```

Reporting issues

Please report issues at: https://github.com/rvalavi/blockCV/issues

Citation

To cite package blockCV in publications, please use:

Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G. blockCV: An R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Methods Ecol Evol. 2019; 10:225--232. https://doi.org/10.1111/2041-210X.13107

Owner

  • Name: Roozbeh Valavi
  • Login: rvalavi
  • Kind: user
  • Location: Melbourne, Australia
  • Company: CSIRO Environment

Data science and spatial ecology

GitHub Events

Total
  • Issues event: 9
  • Watch event: 6
  • Issue comment event: 12
  • Push event: 17
  • Pull request event: 1
Last Year
  • Issues event: 9
  • Watch event: 6
  • Issue comment event: 12
  • Push event: 17
  • Pull request event: 1

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 269
  • Total Committers: 4
  • Avg Commits per committer: 67.25
  • Development Distribution Score (DDS): 0.019
Past Year
  • Commits: 92
  • Committers: 3
  • Avg Commits per committer: 30.667
  • Development Distribution Score (DDS): 0.043
Top Committers
Name Email Commits
Roozbeh Valavi v****r@g****m 264
Ian Flint i****t@2****u 2
Ian Flint i****t@u****u 2
Roozbeh Valavi 3****i 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 48
  • Total pull requests: 7
  • Average time to close issues: 3 months
  • Average time to close pull requests: 7 days
  • Total issue authors: 32
  • Total pull request authors: 4
  • Average comments per issue: 4.06
  • Average comments per pull request: 0.71
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 1
  • Average time to close issues: 25 days
  • Average time to close pull requests: 1 minute
  • Issue authors: 4
  • Pull request authors: 1
  • Average comments per issue: 1.6
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • pat-s (6)
  • AMBarbosa (6)
  • ozgurhsyndgn (3)
  • Cam-in (2)
  • immaryw (2)
  • bcknr (1)
  • Moncef-Boukhecheba (1)
  • Navvie2019 (1)
  • rudeboybert (1)
  • Geethen (1)
  • zhangzhixin1102 (1)
  • topepo (1)
  • anackr (1)
  • bfakos (1)
  • thomasp85 (1)
Pull Request Authors
  • rvalavi (3)
  • MayaGueguen (2)
  • iflint1 (2)
  • be-marc (1)
Top Labels
Issue Labels
enhancement (2) invalid (2) good first issue (2) bug (1) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 700 last-month
  • Total docker downloads: 8
  • Total dependent packages: 8
  • Total dependent repositories: 10
  • Total versions: 10
  • Total maintainers: 1
cran.r-project.org: blockCV

Spatial and Environmental Blocking for K-Fold and LOO Cross-Validation

  • Versions: 10
  • Dependent Packages: 8
  • Dependent Repositories: 10
  • Downloads: 700 Last month
  • Docker Downloads: 8
Rankings
Forks count: 3.6%
Stargazers count: 3.9%
Dependent packages count: 6.6%
Dependent repos count: 9.2%
Average: 10.8%
Downloads: 13.9%
Docker downloads count: 27.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/R-CMD-check.yml actions
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-tinytex v2 composite
DESCRIPTION cran
  • R >= 3.5.0 depends
  • progress * imports
  • raster >= 2.5 imports
  • sf >= 0.8 imports
  • automap >= 1.0 suggests
  • covr * suggests
  • cowplot * suggests
  • future * suggests
  • future.apply * suggests
  • geosphere * suggests
  • ggplot2 >= 3.2.1 suggests
  • knitr * suggests
  • methods * suggests
  • rgdal * suggests
  • rgeos * suggests
  • rmarkdown * suggests
  • shiny >= 1.0.3 suggests
  • shinydashboard * suggests
  • testthat * suggests