virtual-suitability-cube

Scripts to build suitability data cubes and assess the impact of sampling bias when measuring ecological niche.

https://github.com/b-cubed-eu/virtual-suitability-cube

Last synced: 10 months ago · JSON representation ·

Repository

Scripts to build suitability data cubes and assess the impact of sampling bias when measuring ecological niche.

Basic Info

Host: GitHub
Owner: b-cubed-eu
License: mit
Language: R
Default Branch: main
Homepage:
Size: 9.91 MB

Statistics

Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

B-Cubed/Suitability Cube

Description

Scripts to build suitability data cubes

Milestone MS20: Code development for predictive habitat suitability modelling
Milestone MS25: Code testing for predictive habitat suitability

Authors

Rocìo Beatriz Cortès Lobos (Unibo), Matilde Martini (Unibo), Michele Di Musciano (UnivAQ), Duccio Rocchini (Unibo)

Next steps (Nov. 2025)

Add temporal dimension ⏲️
Add an uncertainty measure (dubicube R package)
Add the Dissimilarity Index to measure the distance between training points and predictors
Incorporate DeepMaxEnt

Suitability Cube

Species suitability refers to how favorable an environment is for a species to survive, reproduce, and grow in a specific area and time. It takes into account factors like climate, landscape, and resource availability.

Species Distribution Models (SDMs), also known as Environmental Niche Models (ENMs), are tools that use environmental and species occurrence data to study and predict the distribution of species across time and space. SDMs help identify suitable habitats, forecast the movements of invasive species, and illustrate how species distributions might change due to factors like climate change. They are essential for conservation, allowing us to study how species interact with their environment and make informed decisions to protect biodiversity.

Studying species suitability under different environmental conditions is important for understanding population dynamics, planning conservation actions, and monitoring the effects of climate change and human activities on ecosystems. With this knowledge, we can make better decisions on how to protect habitats and species sustainably.

To facilitate the observation of suitability for multiple species over time and space, we developed a framework that uses Data Cubes, multidimensional arrays that organize data in a structured way. In this tutorial, we outline the steps to create a stars object, which includes three dimensions: time, space (represented as grid cells), and species, with suitability as the main attribute. Stars objects can be sliced, aggregated along one of the dimensions, and analyzed, making them ideal for studying species suitability.

Virtual Suitability Cube (Code Development)

For demonstrating the structure of the data cube, we use virtual species, which are artificially generated species with known suitability maps based on climate data. The main steps include combining climate data to calculate the suitability of two different species over time and in the same area, then merging these species into a single stars object.

Starting with a time series of climate variables, we combine them to create the suitability for two different virtual species, whose trends over time we want to observe.

The two data cubes of species suitability are treated as separate entities, which we can then combine by aggregating them over polygons, creating a vector data cube.

This approach makes it easy to visualize and analyze species suitability across time and space for multiple species at once.

Climatic Data

``` r

load packages

library(sf) library(raster) library(viridis) library(virtualspecies) library(ggplot2) library(tidyverse) library(terra) library(stars) library(geodata) library(purrr) library(reshape2) ```

To begin, we will download climate data from WorldClim. Climate data are among the datasets from which the suitability of a species can be calculated.

In this simple example, we will download only 3 variables: minimum temperature, maximum temperature, and precipitation. The study area is Luxembourg.

``` r

Download climatic data

worldclim_country function.

The res paramater is the resolution, with valid values as 10,5,2.5 and 0.5

The var parameter is the variable name, with valid values as tmin, tmax, tavg, prec, wind, vapr and bio.

The worldclim_country function will return 12 layers of raster (in a SpatRaster), for each variable.

tmin <- worldclimcountry("Lux", "tmin", path=tempdir(), res = 0.5, version = "2.1") tmax <- worldclimcountry("Lux", "tmax", path=tempdir(), res = 0.5, version = "2.1") prec <- worldclim_country("Lux", "prec", path=tempdir(), res = 0.5, version = "2.1")

12 months of precipitation in Luxembourg. Each layer represents the average value for a month.

plot(tmin, col = plasma(500, alpha = 1, begin = 0, end = 1, direction = 1)) ```

Even if there are 12 rasters, each corresponding to a month, the temporal dimension is not expressed in the RasterStack and needs to be added by time function

``` r

here, we will simply use numbers from 1 to 12 to represent # the months.

time(tmin) <- time(tmax) <- time(prec) <- 1:12 climate_vars <- c("tmin", "tmax", "prec")

print(tmin)

class : SpatRaster

dimensions : 180, 180, 12 (nrow, ncol, nlyr)

resolution : 0.008333333, 0.008333333 (x, y)

extent : 5.5, 7, 49, 50.5 (xmin, xmax, ymin, ymax)

coord. ref. : lon/lat WGS 84 (EPSG:4326)

source : LUXwc2.130s_tmin.tif

names : LUXw~min1, LUXw~min2, LUXw~min3, LUXw~min4, LUXw~min5, LUXw~min6, ...

min values : -3.1, -3.2, -0.8, 1.2, 5.6, 8.2, ...

max values : 0.0, -0.1, 2.6, 4.8, 9.0, 12.1, ...

time (raw) : 1 to 12

```

Stars Data Cube

The stars R package provides an infrastructure for managing data cubes. Data cubes are multi-dimensional arrays that enable the organization and analysis of large datasets across multiple dimensions, such as time, space, and various environmental variables.

We will create a stars object that contains the climatic variables as attributes and x and y as dimensions. This way, we have everything we need to proceed into a single object. ```r

first, we create a list that contains all the stacks, to which we apply stasstars.

starsclima <- list(tmin, tmax, prec) %>% lapply(stasstars) %>% do.call("c", .) %>% setNames(climatevars)

print(stars_clima)

stars object with 3 dimensions and 3 attributes

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max.

tmin -3.2 0.4 4.4 4.917869 9.4 14.1

tmax 0.9 5.9 12.5 12.636610 18.9 25.1

prec 46.0 67.0 75.0 76.737418 84.0 143.0

dimension(s):

from to offset delta refsys point x/y

x 1 180 5.5 0.008333 WGS 84 FALSE [x]

y 1 180 50.5 -0.008333 WGS 84 FALSE [y]

time 1 12 1 1 NA NA

```

Virtual Species

The purpose of this brief tutorial is to construct a data structure that allows for the comparison of suitability between two species occupying the same area over time.

As an example, we will consider only two species and two months to observe changes in suitability. The suitability of a real species can be obtained by applying Species Distribution Models (SDMs). But in our case, since we are interested in creating the structure, we will use virtualspecies, which are artificial species randomly generated.

During the creation of these random virtual species, the first step involves generating suitability based on the initial environmental data: in our case, the climatic data.

We will calculate the suitability for each species over the first two months, january and february.

```r

Let's choose to work with the first two months. For both months, we will create a stack containing the 3 climatic variables.

number of months

num_months <- 2

list of RasterStacks for each month

rasterstacks <- map(1:nummonths, function(month) {

# extraction and convertion of the climatic variables layers <- map(climatevars, function(var) { as(starsclima[var,,,month], "Raster")
# convertion into rasters })

# layer stack of the current month stackmonth <- stack(layers) # names names(stackmonth) <- climatevars return(stackmonth) })

january <- rasterstacks[[1]] february <- rasterstacks[[2]]

print(january)

class : RasterStack

dimensions : 180, 180, 32400, 3 (nrow, ncol, ncell, nlayers)

resolution : 0.008333333, 0.008333333 (x, y)

extent : 5.5, 7, 49, 50.5 (xmin, xmax, ymin, ymax)

crs : +proj=longlat +datum=WGS84 +no_defs

names : tmin, tmax, prec

min values : -3.1, 0.9, 53.0

max values : 0.0, 4.9, 135.0

plot(january, col = plasma(500, alpha = 1, begin = 0, end = 1, direction = 1))

```

The following function uses the generateRandomSp function from the virtualspecies , which creates random species based on the stack of climatic variables and allows us to set some parameters.

Each time the function is executed, a new random species is generated. Therefore, to ensure that we obtain the same species, we need to set a fixed seed using set.seed() ```r generatesuitability <- function(climatestack, seedvalue) { set.seed(seedvalue) randomsp <- generateRandomSp(raster.stack = climatestack, convert.to.PA = FALSE, species.type = "multiplicative", approach = "response", relations = "gaussian", realistic.sp = TRUE, plot = FALSE)

return(random_sp$suitab.raster) }

generate the suitability for species 1 in January and February and create a single object for species 1 that contains the suitability for both months

suitsp1jan <- generatesuitability(january, seedvalue = 121) suitsp1feb <- generatesuitability(february, seedvalue = 121) suitsp1 <- c(suitsp1feb, suitsp1_jan)

same for species 2

suitsp2jan <- generatesuitability(january, seedvalue = 456) suitsp2feb <- generatesuitability(february, seedvalue = 456) suitsp2 <- c(suitsp2feb, suitsp2_jan)

add time dimension

time(suitsp2) <- time(suitsp1) <- 1:2

suitability for Species 1

plot(suit_sp1, col = viridis(500, alpha = 1, begin = 0, end = 1, direction = 1)) title("Suitability of Species 1 in Jan-Feb", outer=TRUE, line=-0.9) ```

```r

suitability for Species 2

plot(suit_sp2, col = viridis(500, alpha = 1, begin = 0, end = 1, direction = 1)) title("Suitability of Species 2 in Jan-Feb", outer=TRUE, line=-0.9) ```

For each species, we will create a data cube: the dimensions remain x, y, and time, but this time the attribute is the suitability only, with values ranging from 0 to 1, which characterizes each species. ```r

stars cube for Species 1

suitsp1 <- list(suitsp1) suitcubesp1 <- do.call("c", lapply(suitsp1, stars::stas_stars)) %>% setNames(., c("suit"))

stars cube for Species 2

suitsp2 <- list(suitsp2) suitcubesp2 <- do.call("c", lapply(suitsp2, stars::stasstars)) %>% setNames(., c("suit")) r print(suitcube_sp1)

stars object with 3 dimensions and 1 attribute

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max.

suit 0 0.1493433 0.5079458 0.4706163 0.7610167 1

dimension(s):

from to offset delta refsys x/y

x 1 180 5.5 0.008333 WGS 84 [x]

y 1 180 50.5 -0.008333 WGS 84 [y]

time 1 2 1 1 NA

print(suitcubesp2)

stars object with 3 dimensions and 1 attribute

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max.

suit 0 0.009498931 0.06620133 0.1808423 0.3206807 1

dimension(s):

from to offset delta refsys x/y

x 1 180 5.5 0.008333 WGS 84 [x]

y 1 180 50.5 -0.008333 WGS 84 [y]

time 1 2 1 1 NA

```

Virtual Data Cube

We want to incorporate both species into the same object.

To combine the two species into a single object, we need to collapse the x and y dimensions into one, creating a grid that represents individual cells within the study area. This grid allows us to reduce the dimensions of the stars object and facilitates the transition to a vector data cube.

In this process, pixels in the raster are grouped based on their spatial intersection with a set of vector geometries. Each group is then reduced to a single value using an aggregation function, such as the mean or maximum. The result is a one-dimensional sequence of feature geometries defined in space.

By creating the grid, it is possible to reduce two dimensions (x and y) into a single one (cell): this way, we can observe the suitability over time for both species within the defined polygons. As a result, "species" becomes an additional dimension in our data structure. ``` r

bounding box

bblux <- stbbox(tmin)

from bounding box to polygon

sflux <- stassfc(bblux) %>% st_sf()

grid

luxgrid <- stmakegrid(sflux, cellsize = .1, n = c(100, 100), what = "polygons", square = FALSE, offset = stbbox(sflux)[c("xmin", "ymin")]) %>% stassf() %>% mutate(id = 1:nrow(.))

plot raster with grid

Convert SpatRaster to dataframe

tmindf <- as.data.frame(tmin$LUXwc2.130stmin_1, xy = TRUE, na.rm = TRUE)

ggplot() + # Add raster layer geomraster(data = tmindf, aes(x = x, y = y, fill = LUXwc2.130stmin1)) + scalefillviridisc(alpha = 1, begin = 0, end = 1, option = "viridis") + # Add grid geomsf(data = luxgrid, color = "black", size = 0.5, fill = NA) + themevoid() + theme( legend.position = "none", panel.border = element_blank(), plot.margin = margin(0, 0, 0, 0) ) ```

```r

aggregate by cells, calculating the average of the suitability values within each cell

aggsp1 <- aggregate(suitcubesp1, luxgrid, mean, as_points = TRUE, na.action = na.omit)

aggsp2 <- aggregate(suitcubesp2, luxgrid, mean, as_points = TRUE, na.action = na.omit)

print(agg_sp1)

stars object with 2 dimensions and 1 attribute

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

suit 0.0009594404 0.1817593 0.5260862 0.4765746 0.7482857 0.9380438 36

dimension(s):

from to offset delta refsys point values

x 1 297 NA NA WGS 84 FALSE POLYGON ((5.45 49.02887, ...

time 1 2 1 1 NA NA NULL

print(agg_sp2)

stars object with 2 dimensions and 1 attribute

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

suit 2.707365e-05 0.01614575 0.07359824 0.1784444 0.3089694 0.9104424 36

dimension(s):

from to offset delta refsys point values

x 1 297 NA NA WGS 84 FALSE POLYGON ((5.45 49.02887, ...

time 1 2 1 1 NA NA NULL

``` Finally, we merge the two cubes. Now the dimensions are 3 again: cell, time and species. The attribute represents the species' suitability over time and space.

``` r cubesp1sp2 <- c(aggsp1, aggsp2) %>%
stredimension() %>% stsetdimensions(., which = "newdim", values = c("specie1","specie2"), names = "species") %>% stset_dimensions(., which = "time", values = c("jan","feb"), names = "month") %>%

setNames(.,"suitability")

print(cube_sp1sp2)

stars object with 3 dimensions and 1 attribute

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

suitability 2.707365e-05 0.0440898 0.2516058 0.3275095 0.5842543 0.9380438 72

dimension(s):

from to refsys point values

x 1 297 WGS 84 FALSE POLYGON ((5.45 49.02887, ...,...,POLYGON ((7.05 50.41451, ...

month 1 2 NA NA jan, feb

species 1 2 NA NA specie1, specie2

```

Let's suppose we have a specific point in space: we want to investigate the suitability of the two species at that location With pull function you can extract the attributes in a specific point of space and time. ```r

first, we need to identify the corresponding cell for that location.

whichcell <- stsf(geometry = stsfc(stpoint(c(5.5, 49.0)), crs = 4326)) %>% stjoin(., luxgrid) print(which_cell$id)

[1] 10

this object provides, for each species (,,1) and (,,2), the suitability in January and February in cell 10

in the first case, it decreased; in the second, it increased.

pull(cube_sp1sp2[,10,,], "suitability")

, , 1

[,1] [,2]

[1,] 0.9036776 0.8826196

, , 2

[,1] [,2]

[1,] 0.001298286 0.007460382

we can better visualize this information this way:

valuessuit <- pull(cubesp1sp2[,10,,], "suitability")

transform values_suit into a long format without rewriting it

dflong <- melt(valuessuit) colnames(df_long) <- c("cell", "time", "species", "suitability")

convert time and species into factors

dflong$time <- factor(dflong$time, labels = c("Jan", "Feb")) dflong$species <- factor(dflong$species, labels = c("Species 1", "Species 2"))

plot

ggplot(dflong, aes(x = species, y = suitability, color = species)) + geompoint(size = 3) + facetwrap(~time, scales = "freex", ncol = 2, strip.position = "bottom") + labs(title = "Suitability for species in January and February", x = "Month", y = "Suitability") + scalecolormanual(values = c("Species 1" = "purple2", "Species 2" = "orange")) + thememinimal() + theme( axis.text.x = elementblank(), axis.ticks.x = elementblank(),
panel.spacing = unit(2, "lines"),
axis.line = elementline(color = "black", size = 0.8), strip.placement = "outside", legend.position = "right",
panel.border = elementrect(color = "black", fill = NA, size = 0.)) ```

<img width="550" height="330" src="https://github.com/rociobeatrizc/virtualsuitability_cube/blob/main/images/boxplot.png">

The following is an example of suitability over the course of a whole year (12 months) for 5 different species in two distinct locations in the Luxembourg area.

We can imagine the structure of the cube in this way, where the temporal layers are 12

Suitability Cube (Code testing)

Code testing with real data from Dutch Vegetation Database. Here, we want to test the stars data cube structure for multiple real species. The goal is to predict the suitability of those species in a different area from the training one.

Steps are: * Collecting climatic data of The Netherlands (one month) and creating a stars data cube with them * Collecting only-presence data for 5 species in Netherlands from 2000 to 2017 and splitting them by species name * Using the environmental information and the occurrences, training a MaxEnt model for each species * Collecting climatic data of Belgium (one month) and creating a stars data cube with them * Predicting the suitability of the 5 species in the new area using models trained before * Aggregating the suitability maps over polygons to synthetize all the informations into a single data cube

```r

imports

library(ggplot2) library(terra) library(sf) library(raster) library(viridis) library(tidyverse) library(stars) library(geodata) library(purrr) library(reshape2) library(dplyr) library(enmSdmX) ```

Climatic data for the training area (The Netherlands)

Data collection and aggregation in a data cube that contains the climatic variables as attributes and x, y and time.

``` r

Download climatic data for the Netherlands

worldclim_country function

The res paramater is the resolution, with valid values as 10,5,2.5 and 0.5

The var parameter is the variable name, with valid values as tmin, tmax, tavg, prec, wind, vapr and bio.

The worldclim_country function will return 12 layers of raster (in a SpatRaster), for each variable.

tmin <- worldclimcountry("Nld", "tmin", path=tempdir(), res = 0.5, version = "2.1") tmax <- worldclimcountry("Nld", "tmax", path=tempdir(), res = 0.5, version = "2.1") prec <- worldclimcountry("Nld", "prec", path=tempdir(), res = 0.5, version = "2.1") tavg <- worldclimcountry("Nld", "tavg", path=tempdir(), res = 0.5, version = "2.1") wind <- worldclim_country("Nld", "wind", path=tempdir(), res = 0.5, version = "2.1")

add time

time(tmin) <- time(tmax) <- time(prec) <- time(tavg) <- time(wind) <- 1:12

climate_vars <- c("tmin", "tmax", "prec", "tavg", "wind")

first, we create a list that contains all the stacks, to which we apply stasstars.

starsclima <- list(tmin, tmax, prec, tavg, wind) %>% lapply(stasstars) %>% do.call("c", .) %>% setNames(climatevars)

print(stars_clima)

stars object with 3 dimensions and 5 attributes

attribute(s), summary of first 1e+05 cells:

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

tmin -0.8 -0.4 -0.1 0.007868939 0.2 2.6 63985

tmax 3.7 4.3 4.6 4.718736640 5.0 6.4 63985

prec 58.0 66.0 68.0 67.980036096 70.0 80.0 63985

tavg 1.6 1.9 2.2 2.363004291 2.6 4.5 63985

wind 4.0 4.6 5.2 5.363057057 6.0 7.8 63985

dimension(s):

from to offset delta refsys point x/y

x 1 540 3 0.008333 WGS 84 FALSE [x]

y 1 420 54 -0.008333 WGS 84 FALSE [y]

time 1 12 1 1 NA NA

training month

climamay <- starsclima %>% slice("time", 5)

predictors

climatrain <- rast(climamay) %>% setNames(climate_vars)

plot(clima_train) ```

Occurrences

From Dutch Vegetation Database: GBIF Occurrence Download Occurrences of 5 species, collected from 2000 to 2017 * Galium verum L. * Ophrys apifera Huds. * Paris quadrifolia L. * Chrysosplenium alternifolium L. * Anemone nemorosa L.

They will be use for training MaxEnt model ``` r

occurrences

upload the dataset

read txt

occ <- read.delim("occurrence.txt", sep = "\t", header = TRUE, quote = "", stringsAsFactors = FALSE) %>% select(scientificName, decimalLatitude, decimalLongitude, year)

plot

Filter years from 2000 to 2017

occcounts <- occ %>% filter(year >= 2000 & year <= 2017) %>% groupby(scientificName, year) %>% summarise(count = n(), .groups = "drop")

Heatmap to see the occurrences of each species per year

ggplot(occcounts, aes(x = year, y = scientificName, fill = count)) + geomtile() + scalefillgradientn(colors = mako(10, direction = -1), breaks = seq(0, max(occcounts$count, na.rm = TRUE), by = 150), labels = seq(0, max(occcounts$count, na.rm = TRUE), by = 150)) + labs(title = "occurrences per species per year", x = "year", y = "species", fill = "occurrences") + thememinimal(basesize = 12) + theme(plot.title = elementtext(hjust = 0.5), legend.position = "right", axis.text.x = elementtext(angle = 45, hjust = 1)) + coord_fixed(ratio = 1.5) ```

``` r

filter year

occ <- occ %>% filter(year >= 2000 & year <= 2017) %>% filter(!is.na(decimalLatitude) & !is.na(decimalLongitude))

head(occ)

scientificName decimalLatitude decimalLongitude year

1 Galium verum L. 51.82729 3.95035 2017

2 Galium verum L. 52.87226 4.71202 2017

3 Galium verum L. 50.84668 5.77765 2015

4 Galium verum L. 52.53752 6.47736 2016

5 Anemone nemorosa L. 52.01382 6.07912 2013

6 Anemone nemorosa L. 50.88266 5.82037 2014

split dataset by species with the function splitspeciesdata

species <- splitspeciesdata(occ)

typeof(species)

[1] "list"

names(species)

[1] "Anemone nemorosa L." "Chrysosplenium alternifolium L." "Galium verum L."

[4] "Ophrys apifera Huds." "Paris quadrifolia L."

```

MaxEnt model for each species

The function takes predictors and occurrences to build a MaxEnt model (enmSdmX R package) for each species. Models are then saved and can be used for predicting in a new area. The output contains also prediction maps.

``` r

creating SDMs with MaxEnt for many species in the same area, with the same predictors

sdms <- createsdmforspecieslist(species, climatrain, backgroundpoints = 10000, predictors = names(clima_train))

plot suitability map for Anemone nemorosa L. in The Netherlands

plot(sdms$predictions$Anemone nemorosa L.) ```

New area: Belgium

Based on the models previously trained, let's check the suitability of our species in a different area, i.e. Belgium.

```r

predictions for the same species but in another area (Belgium)

same climatic variables

tminb <- worldclimcountry("Bel", "tmin", path=tempdir(), res = 0.5, version = "2.1") tmaxb <- worldclimcountry("Bel", "tmax", path=tempdir(), res = 0.5, version = "2.1") precb <- worldclimcountry("Bel", "prec", path=tempdir(), res = 0.5, version = "2.1") tavgb <- worldclimcountry("Bel", "tavg", path=tempdir(), res = 0.5, version = "2.1") windb <- worldclimcountry("Bel", "wind", path=tempdir(), res = 0.5, version = "2.1")

time

time(tminb) <- time(tmaxb) <- time(precb) <- time(tavgb) <- time(wind_b) <- 1:12

stars object of Belgium

starsclimabel <- list(tminb, tmaxb, precb, tavgb, windb) %>% lapply(stasstars) %>% do.call("c", .) %>% setNames(climatevars)

may for Belgium (we want to see suitability predictions for the same month)

climamaybel <- starsclimabel %>% slice("time", 5)

predictors in may

climatrainbelmay <- rast(climamaybel) %>% setNames(climatevars)

predicting suitability in a new area for the same species using the models trained previously

newpredictionsmay <- predictsdmfornewarea(sdms$models, climatrainbel_may)

the output is a data cube with suitability as attribute

print(newpredictionsmay)

stars object with 3 dimensions and 1 attribute

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

suit 5.867529e-13 0.03843691 0.1433858 0.2354462 0.3725311 0.9999935 67380

dimension(s):

from to offset delta refsys values x/y

x 1 480 2.5 0.008333 WGS 84 NULL [x]

y 1 360 52 -0.008333 WGS 84 NULL [y]

species 1 5 NA NA NA Anemone nemorosa L.,...,Paris quadrifolia L.

```

Aggregation

The last step is the polygon-based aggregation. In line with the structure proposed by the B-Cubed framework, we aim to summarize species suitability information within a grid covering the study area.

```r

aggregation steps

bounding box

bbox <- stbbox(tminb) sfbel <- stassfc(bbox) %>% stsf()

grid

belgrid <- stmakegrid(sfbel, cellsize = .1, n = c(50, 50), what = "polygons", square = FALSE, offset = stbbox(sfbel)[c("xmin", "ymin")]) %>% stassf() %>% mutate(id = 1:nrow(.))

plot raster with grid

convert SpatRaster to dataframe

tmindf <- as.data.frame(tminb$BELwc2.130stmin1, xy = TRUE, na.rm = TRUE) tmindf p <- ggplot() + geomraster(data = tmindf, aes(x = x, y = y, fill = BELwc2.130stmin1)) + scalefillviridisc(alpha = 1, begin = 0, end = 1, option = "viridis") + geomsf(data = belgrid, color = "black", size = 0.5, fill = NA, alpha = 0.2) + themevoid() + theme( legend.position = "none", panel.border = elementblank(), plot.margin = margin(0, 0, 0, 0) ) plot(p) ```

```r

aggregating suitability values of the species over polygons for a given area

starspredictionsaggregated <- aggregatesuitability(newpredictionsmay, belgrid)

output: data cube

print(starspredictionsaggregated)

stars object with 2 dimensions and 1 attribute

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

suitability 1.540887e-07 0.05111451 0.1581302 0.234773 0.370952 0.9726111 930

dimension(s):

from to refsys point values

x 1 1494 WGS 84 FALSE POLYGON ((2.45 49.02887, ...,...,POLYGON ((6.55 51.97335, ...

species 1 5 NA NA Anemone nemorosa L.,...,Paris quadrifolia L

first, we need to identify the corresponding cell for that location.

whichcell <- stsf(geometry = stsfc(stpoint(c(4.3517, 50.8503)), crs = 4326)) %>% stjoin(., belgrid) print(which_cell$id)

[1] 695

extract suitability values

pull(starspredictionsaggregated[,695,], "suitability")

[,1] [,2] [,3] [,4] [,5]

[1,] 0.00129283 0.03917683 0.02307574 0.3141483 0.07493015

```

References

virtualspecies, an R package to generate virtual species distributions (2018), Leroy B. et al., DOI
Meyer H, Milà C, Ludwig M, Linnenbrink J, Schumacher F (2024). CAST: 'caret' Applications for Spatial-Temporal Models. R package version 1.0.2, https://hannameyer.github.io/CAST/
Gilardi A, Lovelace R (2024). osmextract: Download and Import Open Street Map Data Extracts. R package version 0.5.1.900, https://github.com/ropensci/osmextract.
A practical overview of transferability in species distribution modeling, Werkowska W., Marquez A., Real R., Acevedo P., 2017, DOI
Effect of roadside bias on the accuracy of predictive maps produced by bioclimatic models, Kadmon R., Farber O., Danin A., 2004, DOI
The virtual ecologist approach: simulating data and observers, Zurell et al., 2010, DOI
The n-dimensional hypervolume, Blonder et al., 2014, DOI
The cumulative niche approach: a framework to assess the performance of ecological niche model projections, Arlè et al., 2024, DOI
Smith, A.B., Murphy, S.J., Henderson, D., and Erickson, K.D. 2023. Including imprecisely georeferenced specimens improves accuracy of species distribution models and estimates of niche breadth. Global Ecology and Biogeography In press
Pebesma E, Bivand R (2023). Spatial Data Science: With applications in R. Chapman and Hall/CRC, London. doi:10.1201/9780429459016, https://r-spatial.org/book/.
Meyer, Hanna, and Edzer Pebesma. "Predicting into unknown space? Estimating the area of applicability of spatial prediction models." Methods in Ecology and Evolution 12.9 (2021): 1620-1633.
Steven J. Phillips, Robert P. Anderson, Robert E. Schapire, Maximum entropy modeling of species geographic distributions, Ecological Modelling, Volume 190, Issues 3–4, 2006, Pages 231-259, ISSN 0304-3800, https://doi.org/10.1016/j.ecolmodel.2005.03.026.
Guisan, Antoine, and Wilfried Thuiller. "Predicting species distribution: offering more than simple habitat models." Ecology letters 8.9 (2005): 993-1009.

Owner

Name: B-Cubed
Login: b-cubed-eu
Kind: organization

Website: https://b-cubed.eu
Repositories: 1
Profile: https://github.com/b-cubed-eu

Biodiversity Building Blocks for Policy

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Cortes Lobos"
  given-names: "Rocio Beatriz"
  orcid: "https://orcid.org/0009-0004-8440-958X"
- family-names: "Di Musciano"
  given-names: "Michele"
  orcid: "https://orcid.org/0000-0002-3130-7270"
- family-names: "Matilde"
  given-names: "Martini"
  orcid: "https://orcid.org/0009-0003-5612-925X"
- family-names: "Rocchini"
  given-names: "Duccio"
  orcid: "https://orcid.org/0000-0003-0087-0594"
title: "virtual-suitability-cube"
version: 1.0.0
date-released: 2024-10-24
url: "https://github.com/b-cubed-eu/virtual-suitability-cube"

GitHub Events

Total

Watch event: 1
Push event: 60
Fork event: 1

Last Year

Watch event: 1
Push event: 60
Fork event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

virtual-suitability-cube

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

B-Cubed/Suitability Cube

Description

Authors

Next steps (Nov. 2025)

Suitability Cube

Virtual Suitability Cube (Code Development)

Climatic Data

load packages

Download climatic data

worldclim_country function.

The res paramater is the resolution, with valid values as 10,5,2.5 and 0.5

The var parameter is the variable name, with valid values as tmin, tmax, tavg, prec, wind, vapr and bio.

The worldclim_country function will return 12 layers of raster (in a SpatRaster), for each variable.

12 months of precipitation in Luxembourg. Each layer represents the average value for a month.

here, we will simply use numbers from 1 to 12 to represent # the months.

class : SpatRaster

dimensions : 180, 180, 12 (nrow, ncol, nlyr)

resolution : 0.008333333, 0.008333333 (x, y)

extent : 5.5, 7, 49, 50.5 (xmin, xmax, ymin, ymax)

coord. ref. : lon/lat WGS 84 (EPSG:4326)

source : LUXwc2.130s_tmin.tif

names : LUXw~min1, LUXw~min2, LUXw~min3, LUXw~min4, LUXw~min5, LUXw~min6, ...

min values : -3.1, -3.2, -0.8, 1.2, 5.6, 8.2, ...

max values : 0.0, -0.1, 2.6, 4.8, 9.0, 12.1, ...

time (raw) : 1 to 12

Stars Data Cube

first, we create a list that contains all the stacks, to which we apply stasstars.

stars object with 3 dimensions and 3 attributes

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max.

tmin -3.2 0.4 4.4 4.917869 9.4 14.1

tmax 0.9 5.9 12.5 12.636610 18.9 25.1

prec 46.0 67.0 75.0 76.737418 84.0 143.0

dimension(s):

from to offset delta refsys point x/y

x 1 180 5.5 0.008333 WGS 84 FALSE [x]

y 1 180 50.5 -0.008333 WGS 84 FALSE [y]

time 1 12 1 1 NA NA

Virtual Species

Let's choose to work with the first two months. For both months, we will create a stack containing the 3 climatic variables.

number of months

list of RasterStacks for each month

class : RasterStack

dimensions : 180, 180, 32400, 3 (nrow, ncol, ncell, nlayers)

resolution : 0.008333333, 0.008333333 (x, y)

extent : 5.5, 7, 49, 50.5 (xmin, xmax, ymin, ymax)

crs : +proj=longlat +datum=WGS84 +no_defs

names : tmin, tmax, prec

min values : -3.1, 0.9, 53.0

max values : 0.0, 4.9, 135.0

generate the suitability for species 1 in January and February and create a single object for species 1 that contains the suitability for both months

same for species 2

add time dimension

suitability for Species 1

suitability for Species 2

stars cube for Species 1

stars cube for Species 2

stars object with 3 dimensions and 1 attribute

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max.

suit 0 0.1493433 0.5079458 0.4706163 0.7610167 1

dimension(s):

from to offset delta refsys x/y

x 1 180 5.5 0.008333 WGS 84 [x]

y 1 180 50.5 -0.008333 WGS 84 [y]

time 1 2 1 1 NA

stars object with 3 dimensions and 1 attribute

attribute(s):

Min. 1st Qu. Median Mean 3rd Qu. Max.

suit 0 0.009498931 0.06620133 0.1808423 0.3206807 1

dimension(s):

from to offset delta refsys x/y

x 1 180 5.5 0.008333 WGS 84 [x]