bdc
Check out the vignettes with detailed documentation on each module of the bdc package
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (20.9%) to scientific vocabulary
Keywords
bdc
biodiversity-data
workflow
Last synced: 7 months ago
·
JSON representation
Repository
Check out the vignettes with detailed documentation on each module of the bdc package
Basic Info
- Host: GitHub
- Owner: brunobrr
- License: gpl-3.0
- Language: R
- Default Branch: master
- Homepage: https://brunobrr.github.io/bdc
- Size: 179 MB
Statistics
- Stars: 24
- Watchers: 3
- Forks: 9
- Open Issues: 7
- Releases: 7
Topics
bdc
biodiversity-data
workflow
Created over 5 years ago
· Last pushed 11 months ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
editor_options:
markdown:
wrap: 80
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# ***bdc***
## **A toolkit for standardizing, integrating, and cleaning biodiversity data**
[](https://CRAN.R-project.org/package=bdc)
[](https://cranlogs.r-pkg.org:443/badges/grand-total/bdc)
[](https://github.com/brunobrr/bdc/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/brunobrr/bdc)
[](https://doi.org/10.5281/zenodo.6450390)
[-lightgrey.svg?style=flat)](http://www.gnu.org/licenses/gpl-3.0.html)
#### **Overview**
Handle biodiversity data from several different sources is not an easy task.
Here, we present the **B**iodiversity **D**ata **C**leaning (*bdc*), an R
package to address quality issues and improve the fitness-for-use of biodiversity
datasets. *bdc* contains functions to harmonize and integrate data from
different sources following common standards and protocols, and implements
various tests and tools to flag, document, clean, and correct taxonomic,
spatial, and temporal data.
Compared to other available R packages, the main strengths of the *bdc* package
are that it brings together available tools – and a series of new ones – to
assess the quality of different dimensions of biodiversity data into a single
and flexible toolkit. The functions can be applied to a multitude of taxonomic
groups, datasets (including regional or local repositories), countries, or
worldwide.
#### **Structure of *bdc***
The *bdc* toolkit is organized in thematic modules related to different
biodiversity dimensions.
--------------------------------------------------------------------------------
> :warning: The modules illustrated, and **functions** within, **were linked to
> form** a proposed reproducible **workflow** (see
> [**vignettes**](https://brunobrr.github.io/bdc/)). However, all functions
> **can also be executed independently**.
--------------------------------------------------------------------------------
#### 
#### 1. [**Merge databases**](https://brunobrr.github.io/bdc/articles/integrate_datasets.html)
Standardization and integration of different datasets into a standard database.
- `bdc_standardize_datasets()` Standardization and integration of different
datasets into a new dataset with column names following Darwin Core
terminology
#### 2. [**Pre-filter**](https://brunobrr.github.io/bdc/articles/prefilter.html)
Flagging and removal of invalid or non-interpretable information, followed by
data amendments (e.g., correct transposed coordinates and standardize country
names).
- `bdc_scientificName_empty()` Identification of records lacking names or with
names not interpretable
- `bdc_coordinates_empty()` Identification of records lacking information on
latitude or longitude
- `bdc_coordinates_outOfRange()` Identification of records with out-of-range
coordinates (latitude \> 90 or -90; longitude \>180 or -180)
- `bdc_basisOfRecords_notStandard()` Identification of records from doubtful
sources (e.g., fossil or machine observation) impossible to interpret and
not compatible with Darwin Core recommended vocabulary
- `bdc_country_from_coordinates()` Derive country name from valid geographic
coordinates
- `bdc_country_standardized()` Standardization of country names and retrieve
country code
- `bdc_coordinates_transposed()` Identification of records with potentially
transposed latitude and longitude
- `bdc_coordinates_country_inconsistent()` Identification of coordinates in
other countries or far from a specified distance from the coast of a
reference country (i.e., in the ocean)
- `bdc_coordinates_from_locality()` Identification of records lacking
coordinates but with a detailed description of the locality associate with
records from which coordinates can be derived
#### 3. [**Taxonomy**](https://brunobrr.github.io/bdc/articles/taxonomy.html)
Cleaning, parsing, and harmonization of scientific names against multiple
taxonomic references.
- `bdc_clean_names()` Name-checking routines to clean and split a taxonomic
name into its binomial and authority components
- `bdc_query_names_taxadb()` Harmonization of scientific names by correcting
spelling errors and converting nomenclatural synonyms to currently accepted
names.
- `bdc_filter_out_names()` Function used to filter out records according to
their taxonomic status present in the column "notes". For example, to filter
only valid accepted names categorized as "accepted"
#### 4. [**Space**](https://brunobrr.github.io/bdc/articles/space.html)
Flagging of erroneous, suspicious, and low-precision geographic coordinates.
- `bdc_coordinates_precision()` Identification of records with a coordinate
precision below a specified number of decimal places
- `clean_coordinates()` (From *CoordinateCleaner* package and part of the
data-cleaning workflow). Identification of potentially problematic
geographic coordinates based on geographic gazetteers and metadata. Include
tests for flagging records: around country capitals or country or province
centroids, duplicated, with equal coordinates, around biodiversity
institutions, within urban areas, plain zeros in the coordinates, and
suspect geographic outliers
#### 5. [**Time**](https://brunobrr.github.io/bdc/articles/time.html)
Flagging and, whenever possible, correction of inconsistent collection date.
- `bdc_eventDate_empty()` Identification of records lacking information on
event date (i.e., when a record was collected or observed)
- `bdc_year_outOfRange()` Identification of records with illegitimate or
potentially imprecise collecting year. The year provided can be out-of-range
(e.g., in the future) or collected before a specified year supplied by the
user (e.g., 1900)
- `bdc_year_from_eventDate()` This function extracts four-digit year from
unambiguously interpretable collecting dates
#### [**Other functions**](https://brunobrr.github.io/bdc/reference/index.html)
Aim to facilitate the **documentation, visualization, and interpretation** of
results of data quality tests the package contains functions for documenting the
results of the data-cleaning tests, including functions for saving i) records
needing further inspection, ii) figures, and iii) data-quality reports.
- `bdc_create_report()` Creation of data-quality reports documenting the
results of data-quality tests and the taxonomic harmonization process
- `bdc_create_figures()` Creation of figures (i.e., bar plots and maps)
reporting the results of data-quality tests
- `bdc_filter_out_flags()` Removal of columns containing the results of data
quality tests (i.e., column starting with ".") or other columns specified
- `bdc_quickmap()` Creation of a map of points using ggplot2. Helpful in
inspecting the results of data-cleaning tests
- `bdc_summary_col()` This function creates or updates the column summarizing
the results of data quality tests (i.e., the column ".summary")
#### **Installation**
```{r eval=FALSE}
install.packages("bdc")
library(bdc)
```
or the development version from [GitHub](https://github.com/brunobrr/bdc) using:
```{r, message=FALSE, warning=FALSE,echo=TRUE,eval=FALSE}
install.packages("remotes")
remotes::install_github("brunobrr/bdc")
```
Load the package with:
```{r, message=FALSE, warning=FALSE,echo=TRUE,eval=TRUE}
library(bdc)
```
#### **Package website**
See *bdc* package website ( ) for detailed
explanation on each module.
#### **Getting help**
> If you encounter a clear bug, please file an issue
> [**here**](https://github.com/brunobrr/bdc/issues). For questions or
> suggestion, please send us a email (ribeiro.brr\@gmail.com).
#### **Citation**
Ribeiro, BR; Velazco, SJE; Guidoni-Martins, K; Tessarolo, G; Jardim, Lucas;
Bachman, SP; Loyola, R (2022). bdc: A toolkit for standardizing, integrating,
and cleaning biodiversity data. Methods in Ecology and Evolution.
[doi.org/10.1111/2041-210X.13868](https://doi.org/10.1111/2041-210X.13868)
Owner
- Name: Bruno R Ribeiro
- Login: brunobrr
- Kind: user
- Twitter: ribeiro_brr
- Repositories: 1
- Profile: https://github.com/brunobrr
PhD Brazilian Foundation for Sustainable Development, Brazil.
GitHub Events
Total
- Create event: 2
- Issues event: 3
- Release event: 1
- Delete event: 1
- Issue comment event: 12
- Push event: 67
- Pull request event: 5
Last Year
- Create event: 2
- Issues event: 3
- Release event: 1
- Delete event: 1
- Issue comment event: 12
- Push event: 67
- Pull request event: 5
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Bruno R. Ribeiro | r****r@g****m | 459 |
| Karlo Guidoni Martins | k****s@g****m | 249 |
| lucas-jardim | l****9@g****m | 86 |
| Santiago Velazco | s****o@g****m | 45 |
| sjevelazco | s****c@g****m | 37 |
| Geiziane | g****s@g****m | 26 |
| Zander | z****o@g****m | 2 |
| brunobrr | b****o@M****l | 2 |
| Your Namebrunobrr | y****u@e****m | 1 |
| Ronald Bergmann | i****o@b****t | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 32
- Total pull requests: 86
- Average time to close issues: 4 months
- Average time to close pull requests: 3 days
- Total issue authors: 20
- Total pull request authors: 7
- Average comments per issue: 3.31
- Average comments per pull request: 0.03
- Merged pull requests: 81
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 3
- Average time to close issues: 44 minutes
- Average time to close pull requests: 4 months
- Issue authors: 3
- Pull request authors: 3
- Average comments per issue: 1.67
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- black-snow (4)
- kguidonimartins (3)
- GilbertAlarcon-Cruz (3)
- lucas-jardim (2)
- brunobrr (2)
- fredtaka (2)
- jt-tbc (1)
- max-sfeeri (1)
- sjevelazco (1)
- oliveirab (1)
- rsbivand (1)
- brunomioto (1)
- paschatz (1)
- peake-and-troughs (1)
- SEveringham (1)
Pull Request Authors
- sjevelazco (39)
- brunobrr (23)
- kguidonimartins (20)
- black-snow (3)
- andrew-1234 (2)
- Geiziane (2)
- matthewsrogan (1)
Top Labels
Issue Labels
bug (3)
dependency (3)
faq (3)
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- cran 291 last-month
-
Total dependent packages: 1
(may contain duplicates) -
Total dependent repositories: 1
(may contain duplicates) - Total versions: 7
- Total maintainers: 1
proxy.golang.org: github.com/brunobrr/bdc
- Documentation: https://pkg.go.dev/github.com/brunobrr/bdc#section-documentation
- License: gpl-3.0
-
Latest release: v1.0.0
published almost 4 years ago
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced:
7 months ago
cran.r-project.org: bdc
Biodiversity Data Cleaning
- Homepage: https://brunobrr.github.io/bdc/ (website) https://github.com/brunobrr/bdc
- Documentation: http://cran.r-project.org/web/packages/bdc/bdc.pdf
- License: GPL (≥ 3)
-
Latest release: 1.1.5
published over 1 year ago
Rankings
Forks count: 8.0%
Stargazers count: 11.9%
Average: 18.8%
Downloads: 21.9%
Dependent repos count: 24.3%
Dependent packages count: 27.9%
Maintainers (1)
Last synced:
7 months ago
Dependencies
DESCRIPTION
cran
- CoordinateCleaner * imports
- DT * imports
- dplyr * imports
- foreach * imports
- fs * imports
- ggplot2 * imports
- here * imports
- magrittr * imports
- purrr * imports
- qs * imports
- readr * imports
- rgnparser * imports
- rnaturalearth * imports
- sf >= 1.0.5 imports
- stringdist * imports
- stringi * imports
- stringr * imports
- taxadb >= 0.1.3 imports
- tibble * imports
- tidyselect * imports
- DBI * suggests
- contentid >= 0.0.15 suggests
- countrycode * suggests
- covr * suggests
- cowplot * suggests
- doParallel * suggests
- duckdb >= 0.3.2 suggests
- knitr >= 1.31 suggests
- maps * suggests
- markdown * suggests
- rangeBuilder * suggests
- rappdirs * suggests
- raster * suggests
- remotes * suggests
- rlang >= 1.0.1 suggests
- rmarkdown * suggests
- rnaturalearthdata * suggests
- rvest * suggests
- sp * suggests
- testthat >= 3.0.0 suggests
- xml2 * suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/cache v2 composite
- actions/checkout v2 composite
- actions/upload-artifact v2 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite
.github/workflows/pkgdown.yaml
actions
- actions/cache v2 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite