Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: galenholt
  • License: other
  • Language: R
  • Default Branch: master
  • Size: 188 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# peeler


[![R-CMD-check](https://github.com/galenholt/peeler/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/galenholt/peeler/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/galenholt/peeler/branch/master/graph/badge.svg)](https://app.codecov.io/gh/galenholt/peeler?branch=master)


Peeler implements the bvstep algorithm from Clarke and Warwick 1998 and uses it to 'peel' a dataset to find structural redundancy. It also provides a way to randomly start bvstep many times to assess consistency and avoid local optima.

## Installation

You can install the development version of peeler from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("galenholt/peeler")
```

## Example



```{r example}
library(peeler)
library(vegan)
data(varespec)
```

You can use `bvstep` alone for a single realisation.

```{r}
bvout <- bvstep(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rand_start = TRUE, nrand = 5
)

bvout
```

While it is always possible to get a correlation of 1 when `ref_mat` and `comp_mat` are the same, it is sometimes the case that local optima will cause the algorithm to cut off before `rho_threshold` is reached. If you want to *always* get at least one result, use a negative `min_delta_rho` (ideally `min_delta_rho = -Inf`). We force it to happen here by setting rho to 1.

```{r}
bvout_force <- bvstep(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rand_start = TRUE, nrand = 5,
  rho_threshold = 1,
  min_delta_rho = -Inf
)

bvout_force
```

If the two matrices are not identical (e.g. species and environment), this will keep adding columns until either `rho_threshold` is met or they are all included. Note that in this case, it is not guaranteed that this is the highest correlation possible when the two matrices differ.

## Random starts to explore the space

The `bv_multi` function runs bvstep for a number of random starts to avoid local optima, here with 5 species to start, iterated 10 times. We can set the `num_best_results` to set how many results to return. Here, 'best' above `rho_threshold` is determined first by minimum species and then correlation, while below `rho_threshold` it is determined first by correlation and then number of species. This is in keeping with the idea of finding the fewest species to meet the threshold. To return all steps, simply set `num_best_results` to the same value as `num_restarts`.

```{r}
bv_m <- bv_multi(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rho_threshold = 0.95,
  return_type = "final",
  rand_start = TRUE, nrand = 5, num_restarts = 10
)

bv_m
```

The default `return_type = 'final'` gives the best outcome of each random start, for the best `num_best_results`. If we want the full steps of each of the `num_random_starts`, set `return_type = 'steps'`. This can also be returned as a list, if `returndf = FALSE`.

```{r}
bv_steps <- bv_multi(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rho_threshold = 0.95,
  return_type = "steps",
  rand_start = TRUE, nrand = 5, num_restarts = 10
)
bv_steps
```

With `return_type = 'unique'`, we return the best `num_best_results` from all steps in all random starts. The first line of this should match the first line of `return_type = 'final'`, since that is the best result overall. After that, they may differ as the penultimate set from a particular random start might be better than the final of some others.

```{r}
bv_unique <- bv_multi(
  ref_mat = varespec, comp_mat = varespec,
  ref_dist = "bray", comp_dist = "bray",
  rho_threshold = 0.95,
  return_type = "unique",
  rand_start = TRUE, nrand = 5, num_restarts = 10
)
bv_unique
```

## Peels

The `peel` function runs `bv_multi` iteratively, removing the best set each time.
```{r}
peels <- peel(
  ref_mat = varespec,
  comp_mat = varespec,
  nrand = 6,
  num_restarts = 10,
  corr_method = "spearman"
)
peels
```

There are a number of user-defineable options in each of those functions, see their documentation.

THere are also two potentially useful helper functions, `extract_final`, which gets the last step in `bvstep` output, and `extract_names`. The `bvstep` output has names in a single string with comma-separated species names, and we often want them as a character vector. The `extract_names` function parses this.

```{r}
best_bv <- extract_names(bvout, step = "last")
best_bv
```



Owner

  • Name: Galen Holt
  • Login: galenholt
  • Kind: user

Citation (CITATION.cff)

# -----------------------------------------------------------
# CITATION file created with {cffr} R package, v1.0.0
# See also: https://docs.ropensci.org/cffr/
# -----------------------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "peeler" in publications use:'
type: software
license: MIT
title: 'peeler: Implements peeling community datasets using sequential bvstep algorithms,
  as in Clarke and Warwick 1998'
version: 0.1.0
abstract: This package provides implementations of the bvstep forward/backward algorithm
  to find the best subset of the data that matches the full community. It further
  provides functions to iterate that algorithm over some number of random starts to
  assess consistency. The `peel` function then uses these to iteratively find the
  best subset and remove it.
authors:
- family-names: Holt
  given-names: Galen
  email: g.holt@deakin.edu.au
  orcid: https://orcid.org/0000-0002-7455-9275
contact:
- family-names: Holt
  given-names: Galen
  email: g.holt@deakin.edu.au
  orcid: https://orcid.org/0000-0002-7455-9275
references:
- type: software
  title: 'R: A Language and Environment for Statistical Computing'
  notes: Depends
  url: https://www.R-project.org/
  authors:
  - name: R Core Team
  institution:
    name: R Foundation for Statistical Computing
    address: Vienna, Austria
  year: '2024'
  version: '>= 2.10'
- type: software
  title: dplyr
  abstract: 'dplyr: A Grammar of Data Manipulation'
  notes: Imports
  url: https://dplyr.tidyverse.org
  repository: https://CRAN.R-project.org/package=dplyr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
    orcid: https://orcid.org/0000-0003-4757-117X
  - family-names: François
    given-names: Romain
    orcid: https://orcid.org/0000-0002-2444-4226
  - family-names: Henry
    given-names: Lionel
  - family-names: Müller
    given-names: Kirill
    orcid: https://orcid.org/0000-0002-1416-3412
  - family-names: Vaughan
    given-names: Davis
    email: davis@posit.co
    orcid: https://orcid.org/0000-0003-4777-038X
  year: '2024'
- type: software
  title: glue
  abstract: 'glue: Interpreted String Literals'
  notes: Imports
  url: https://glue.tidyverse.org/
  repository: https://CRAN.R-project.org/package=glue
  authors:
  - family-names: Hester
    given-names: Jim
    orcid: https://orcid.org/0000-0002-2739-7082
  - family-names: Bryan
    given-names: Jennifer
    email: jenny@rstudio.com
    orcid: https://orcid.org/0000-0002-6983-2759
  year: '2024'
- type: software
  title: purrr
  abstract: 'purrr: Functional Programming Tools'
  notes: Imports
  url: https://purrr.tidyverse.org/
  repository: https://CRAN.R-project.org/package=purrr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
    orcid: https://orcid.org/0000-0003-4757-117X
  - family-names: Henry
    given-names: Lionel
    email: lionel@rstudio.com
  year: '2024'
- type: software
  title: rlang
  abstract: 'rlang: Functions for Base Types and Core R and ''Tidyverse'' Features'
  notes: Imports
  url: https://rlang.r-lib.org
  repository: https://CRAN.R-project.org/package=rlang
  authors:
  - family-names: Henry
    given-names: Lionel
    email: lionel@posit.co
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2024'
- type: software
  title: stringr
  abstract: 'stringr: Simple, Consistent Wrappers for Common String Operations'
  notes: Imports
  url: https://stringr.tidyverse.org
  repository: https://CRAN.R-project.org/package=stringr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  year: '2024'
- type: software
  title: tibble
  abstract: 'tibble: Simple Data Frames'
  notes: Imports
  url: https://tibble.tidyverse.org/
  repository: https://CRAN.R-project.org/package=tibble
  authors:
  - family-names: Müller
    given-names: Kirill
    email: kirill@cynkra.com
    orcid: https://orcid.org/0000-0002-1416-3412
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  year: '2024'
- type: software
  title: vegan
  abstract: 'vegan: Community Ecology Package'
  notes: Imports
  url: https://github.com/vegandevs/vegan
  repository: https://CRAN.R-project.org/package=vegan
  authors:
  - family-names: Oksanen
    given-names: Jari
    email: jhoksane@gmail.com
  - family-names: Simpson
    given-names: Gavin L.
    email: ucfagls@gmail.com
  - family-names: Blanchet
    given-names: F. Guillaume
  - family-names: Kindt
    given-names: Roeland
  - family-names: Legendre
    given-names: Pierre
  - family-names: Minchin
    given-names: Peter R.
  - family-names: O'Hara
    given-names: R.B.
  - family-names: Solymos
    given-names: Peter
  - family-names: Stevens
    given-names: M. Henry H.
  - family-names: Szoecs
    given-names: Eduard
  - family-names: Wagner
    given-names: Helene
  - family-names: Barbour
    given-names: Matt
  - family-names: Bedward
    given-names: Michael
  - family-names: Bolker
    given-names: Ben
  - family-names: Borcard
    given-names: Daniel
  - family-names: Carvalho
    given-names: Gustavo
  - family-names: Chirico
    given-names: Michael
  - family-names: De Caceres
    given-names: Miquel
  - family-names: Durand
    given-names: Sebastien
  - family-names: Evangelista
    given-names: Heloisa Beatriz Antoniazi
  - family-names: FitzJohn
    given-names: Rich
  - family-names: Friendly
    given-names: Michael
  - family-names: Furneaux
    given-names: Brendan
  - family-names: Hannigan
    given-names: Geoffrey
  - family-names: Hill
    given-names: Mark O.
  - family-names: Lahti
    given-names: Leo
  - family-names: McGlinn
    given-names: Dan
  - family-names: Ouellette
    given-names: Marie-Helene
  - family-names: Ribeiro Cunha
    given-names: Eduardo
  - family-names: Smith
    given-names: Tyler
  - family-names: Stier
    given-names: Adrian
  - family-names: Ter Braak
    given-names: Cajo J.F.
  - family-names: Weedon
    given-names: James
  year: '2024'
- type: software
  title: furrr
  abstract: 'furrr: Apply Mapping Functions in Parallel using Futures'
  notes: Suggests
  url: https://furrr.futureverse.org/
  repository: https://CRAN.R-project.org/package=furrr
  authors:
  - family-names: Vaughan
    given-names: Davis
    email: davis@rstudio.com
  - family-names: Dancho
    given-names: Matt
    email: mdancho@business-science.io
  year: '2024'
- type: software
  title: future
  abstract: 'future: Unified Parallel and Distributed Processing in R for Everyone'
  notes: Suggests
  url: https://future.futureverse.org
  repository: https://CRAN.R-project.org/package=future
  authors:
  - family-names: Bengtsson
    given-names: Henrik
    email: henrikb@braju.com
  year: '2024'
- type: software
  title: testthat
  abstract: 'testthat: Unit Testing for R'
  notes: Suggests
  url: https://testthat.r-lib.org
  repository: https://CRAN.R-project.org/package=testthat
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2024'
  version: '>= 3.0.0'

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1