Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.6%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
·
Repository
Basic Info
- Host: GitHub
- Owner: galenholt
- License: other
- Language: R
- Default Branch: master
- Size: 188 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 2 years ago
· Last pushed over 1 year ago
Metadata Files
Readme
License
Citation
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# peeler
[](https://github.com/galenholt/peeler/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/galenholt/peeler?branch=master)
Peeler implements the bvstep algorithm from Clarke and Warwick 1998 and uses it to 'peel' a dataset to find structural redundancy. It also provides a way to randomly start bvstep many times to assess consistency and avoid local optima.
## Installation
You can install the development version of peeler from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("galenholt/peeler")
```
## Example
```{r example}
library(peeler)
library(vegan)
data(varespec)
```
You can use `bvstep` alone for a single realisation.
```{r}
bvout <- bvstep(
ref_mat = varespec, comp_mat = varespec,
ref_dist = "bray", comp_dist = "bray",
rand_start = TRUE, nrand = 5
)
bvout
```
While it is always possible to get a correlation of 1 when `ref_mat` and `comp_mat` are the same, it is sometimes the case that local optima will cause the algorithm to cut off before `rho_threshold` is reached. If you want to *always* get at least one result, use a negative `min_delta_rho` (ideally `min_delta_rho = -Inf`). We force it to happen here by setting rho to 1.
```{r}
bvout_force <- bvstep(
ref_mat = varespec, comp_mat = varespec,
ref_dist = "bray", comp_dist = "bray",
rand_start = TRUE, nrand = 5,
rho_threshold = 1,
min_delta_rho = -Inf
)
bvout_force
```
If the two matrices are not identical (e.g. species and environment), this will keep adding columns until either `rho_threshold` is met or they are all included. Note that in this case, it is not guaranteed that this is the highest correlation possible when the two matrices differ.
## Random starts to explore the space
The `bv_multi` function runs bvstep for a number of random starts to avoid local optima, here with 5 species to start, iterated 10 times. We can set the `num_best_results` to set how many results to return. Here, 'best' above `rho_threshold` is determined first by minimum species and then correlation, while below `rho_threshold` it is determined first by correlation and then number of species. This is in keeping with the idea of finding the fewest species to meet the threshold. To return all steps, simply set `num_best_results` to the same value as `num_restarts`.
```{r}
bv_m <- bv_multi(
ref_mat = varespec, comp_mat = varespec,
ref_dist = "bray", comp_dist = "bray",
rho_threshold = 0.95,
return_type = "final",
rand_start = TRUE, nrand = 5, num_restarts = 10
)
bv_m
```
The default `return_type = 'final'` gives the best outcome of each random start, for the best `num_best_results`. If we want the full steps of each of the `num_random_starts`, set `return_type = 'steps'`. This can also be returned as a list, if `returndf = FALSE`.
```{r}
bv_steps <- bv_multi(
ref_mat = varespec, comp_mat = varespec,
ref_dist = "bray", comp_dist = "bray",
rho_threshold = 0.95,
return_type = "steps",
rand_start = TRUE, nrand = 5, num_restarts = 10
)
bv_steps
```
With `return_type = 'unique'`, we return the best `num_best_results` from all steps in all random starts. The first line of this should match the first line of `return_type = 'final'`, since that is the best result overall. After that, they may differ as the penultimate set from a particular random start might be better than the final of some others.
```{r}
bv_unique <- bv_multi(
ref_mat = varespec, comp_mat = varespec,
ref_dist = "bray", comp_dist = "bray",
rho_threshold = 0.95,
return_type = "unique",
rand_start = TRUE, nrand = 5, num_restarts = 10
)
bv_unique
```
## Peels
The `peel` function runs `bv_multi` iteratively, removing the best set each time.
```{r}
peels <- peel(
ref_mat = varespec,
comp_mat = varespec,
nrand = 6,
num_restarts = 10,
corr_method = "spearman"
)
peels
```
There are a number of user-defineable options in each of those functions, see their documentation.
THere are also two potentially useful helper functions, `extract_final`, which gets the last step in `bvstep` output, and `extract_names`. The `bvstep` output has names in a single string with comma-separated species names, and we often want them as a character vector. The `extract_names` function parses this.
```{r}
best_bv <- extract_names(bvout, step = "last")
best_bv
```
Owner
- Name: Galen Holt
- Login: galenholt
- Kind: user
- Repositories: 2
- Profile: https://github.com/galenholt
Citation (CITATION.cff)
# -----------------------------------------------------------
# CITATION file created with {cffr} R package, v1.0.0
# See also: https://docs.ropensci.org/cffr/
# -----------------------------------------------------------
cff-version: 1.2.0
message: 'To cite package "peeler" in publications use:'
type: software
license: MIT
title: 'peeler: Implements peeling community datasets using sequential bvstep algorithms,
as in Clarke and Warwick 1998'
version: 0.1.0
abstract: This package provides implementations of the bvstep forward/backward algorithm
to find the best subset of the data that matches the full community. It further
provides functions to iterate that algorithm over some number of random starts to
assess consistency. The `peel` function then uses these to iteratively find the
best subset and remove it.
authors:
- family-names: Holt
given-names: Galen
email: g.holt@deakin.edu.au
orcid: https://orcid.org/0000-0002-7455-9275
contact:
- family-names: Holt
given-names: Galen
email: g.holt@deakin.edu.au
orcid: https://orcid.org/0000-0002-7455-9275
references:
- type: software
title: 'R: A Language and Environment for Statistical Computing'
notes: Depends
url: https://www.R-project.org/
authors:
- name: R Core Team
institution:
name: R Foundation for Statistical Computing
address: Vienna, Austria
year: '2024'
version: '>= 2.10'
- type: software
title: dplyr
abstract: 'dplyr: A Grammar of Data Manipulation'
notes: Imports
url: https://dplyr.tidyverse.org
repository: https://CRAN.R-project.org/package=dplyr
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
orcid: https://orcid.org/0000-0003-4757-117X
- family-names: François
given-names: Romain
orcid: https://orcid.org/0000-0002-2444-4226
- family-names: Henry
given-names: Lionel
- family-names: Müller
given-names: Kirill
orcid: https://orcid.org/0000-0002-1416-3412
- family-names: Vaughan
given-names: Davis
email: davis@posit.co
orcid: https://orcid.org/0000-0003-4777-038X
year: '2024'
- type: software
title: glue
abstract: 'glue: Interpreted String Literals'
notes: Imports
url: https://glue.tidyverse.org/
repository: https://CRAN.R-project.org/package=glue
authors:
- family-names: Hester
given-names: Jim
orcid: https://orcid.org/0000-0002-2739-7082
- family-names: Bryan
given-names: Jennifer
email: jenny@rstudio.com
orcid: https://orcid.org/0000-0002-6983-2759
year: '2024'
- type: software
title: purrr
abstract: 'purrr: Functional Programming Tools'
notes: Imports
url: https://purrr.tidyverse.org/
repository: https://CRAN.R-project.org/package=purrr
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@rstudio.com
orcid: https://orcid.org/0000-0003-4757-117X
- family-names: Henry
given-names: Lionel
email: lionel@rstudio.com
year: '2024'
- type: software
title: rlang
abstract: 'rlang: Functions for Base Types and Core R and ''Tidyverse'' Features'
notes: Imports
url: https://rlang.r-lib.org
repository: https://CRAN.R-project.org/package=rlang
authors:
- family-names: Henry
given-names: Lionel
email: lionel@posit.co
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2024'
- type: software
title: stringr
abstract: 'stringr: Simple, Consistent Wrappers for Common String Operations'
notes: Imports
url: https://stringr.tidyverse.org
repository: https://CRAN.R-project.org/package=stringr
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@rstudio.com
year: '2024'
- type: software
title: tibble
abstract: 'tibble: Simple Data Frames'
notes: Imports
url: https://tibble.tidyverse.org/
repository: https://CRAN.R-project.org/package=tibble
authors:
- family-names: Müller
given-names: Kirill
email: kirill@cynkra.com
orcid: https://orcid.org/0000-0002-1416-3412
- family-names: Wickham
given-names: Hadley
email: hadley@rstudio.com
year: '2024'
- type: software
title: vegan
abstract: 'vegan: Community Ecology Package'
notes: Imports
url: https://github.com/vegandevs/vegan
repository: https://CRAN.R-project.org/package=vegan
authors:
- family-names: Oksanen
given-names: Jari
email: jhoksane@gmail.com
- family-names: Simpson
given-names: Gavin L.
email: ucfagls@gmail.com
- family-names: Blanchet
given-names: F. Guillaume
- family-names: Kindt
given-names: Roeland
- family-names: Legendre
given-names: Pierre
- family-names: Minchin
given-names: Peter R.
- family-names: O'Hara
given-names: R.B.
- family-names: Solymos
given-names: Peter
- family-names: Stevens
given-names: M. Henry H.
- family-names: Szoecs
given-names: Eduard
- family-names: Wagner
given-names: Helene
- family-names: Barbour
given-names: Matt
- family-names: Bedward
given-names: Michael
- family-names: Bolker
given-names: Ben
- family-names: Borcard
given-names: Daniel
- family-names: Carvalho
given-names: Gustavo
- family-names: Chirico
given-names: Michael
- family-names: De Caceres
given-names: Miquel
- family-names: Durand
given-names: Sebastien
- family-names: Evangelista
given-names: Heloisa Beatriz Antoniazi
- family-names: FitzJohn
given-names: Rich
- family-names: Friendly
given-names: Michael
- family-names: Furneaux
given-names: Brendan
- family-names: Hannigan
given-names: Geoffrey
- family-names: Hill
given-names: Mark O.
- family-names: Lahti
given-names: Leo
- family-names: McGlinn
given-names: Dan
- family-names: Ouellette
given-names: Marie-Helene
- family-names: Ribeiro Cunha
given-names: Eduardo
- family-names: Smith
given-names: Tyler
- family-names: Stier
given-names: Adrian
- family-names: Ter Braak
given-names: Cajo J.F.
- family-names: Weedon
given-names: James
year: '2024'
- type: software
title: furrr
abstract: 'furrr: Apply Mapping Functions in Parallel using Futures'
notes: Suggests
url: https://furrr.futureverse.org/
repository: https://CRAN.R-project.org/package=furrr
authors:
- family-names: Vaughan
given-names: Davis
email: davis@rstudio.com
- family-names: Dancho
given-names: Matt
email: mdancho@business-science.io
year: '2024'
- type: software
title: future
abstract: 'future: Unified Parallel and Distributed Processing in R for Everyone'
notes: Suggests
url: https://future.futureverse.org
repository: https://CRAN.R-project.org/package=future
authors:
- family-names: Bengtsson
given-names: Henrik
email: henrikb@braju.com
year: '2024'
- type: software
title: testthat
abstract: 'testthat: Unit Testing for R'
notes: Suggests
url: https://testthat.r-lib.org
repository: https://CRAN.R-project.org/package=testthat
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2024'
version: '>= 3.0.0'
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1