CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.

https://github.com/ropensci/coordinatecleaner

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.8%) to scientific vocabulary

Keywords

r r-package rstats

Keywords from Contributors

landscape-ecology genome point-pattern-analysis biodiversity neutral-landscape-model ecology reproducibility drake makefile ropensci
Last synced: 7 months ago · JSON representation

Repository

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.

Basic Info
Statistics
  • Stars: 84
  • Watchers: 14
  • Forks: 23
  • Open Issues: 30
  • Releases: 4
Topics
r r-package rstats
Created over 9 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing Codemeta

README.md

CoordinateCleaner v3.0

CRAN_Status_Badge downloads rstudio mirror downloads Project Status: Active – The project has reached a stable, usable state and is being actively developed. DOI rOpenSci peer-review

CoordinateCleaner has been updated to version 3.0 on github and on CRAN to adapt to the retirement of sp and raster. The update may not be compatible with analysis-pipelines build with version 2.x*

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology. Specifically includes tests for

  • General coordinate validity
  • Country and province centroids
  • Capital coordinates
  • Coordinates of biodiversity institutions
  • Spatial outliers
  • Temporal outliers
  • Coordinate-country discordance
  • Duplicated coordinates per species
  • Assignment to the location of the GBIF headquarters
  • Urban areas
  • Seas
  • Plain zeros
  • Equal longitude and latitude
  • Rounded coordinates
  • DDMM to DD.DD coordinate conversion errors
  • Large temporal uncertainty (fossils)
  • Equal minimum and maximum ages (fossils)
  • Spatio-temporal outliers (fossils)

CoordinateCleaner can be particularly useful to improve data quality when using data from GBIF (e.g. obtained with rgbif) or the Paleobiology database (e.g. obtained with paleobioDB) for historical biogeography (e.g. with BioGeoBEARS or phytools), automated conservation assessment (e.g. with speciesgeocodeR or conR) or species distribution modelling (e.g. with dismo or sdm). See scrubr and taxize for complementary taxonomic cleaning or biogeo for correcting spatial coordinate errors.

See News for update information.

Installation

Stable from CRAN

r install.packages("CoordinateCleaner") library(CoordinateCleaner)

Developmental from GitHub

r devtools::install_github("ropensci/CoordinateCleaner") library(CoordinateCleaner)

Usage

A simple example:

```r

Simulate example data

minages <- runif(250, 0, 65) exmpl <- data.frame(species = sample(letters, size = 250, replace = TRUE), decimalLongitude = runif(250, min = 42, max = 51), decimalLatitude = runif(250, min = -26, max = -11), minma = minages, maxma = minages + runif(250, 0.1, 65), dataset = "clean")

Run record-level tests

rl <- clean_coordinates(x = exmpl) summary(rl) plot(rl)

Dataset level

dsl <- clean_dataset(exmpl)

For fossils

fl <- clean_fossils(x = exmpl, taxon = "species", lon = "decimalLongitude", lat = "decimalLatitude") summary(fl)

Alternative example using the pipe

library(tidyverse)

cl <- exmpl %>% ccval()%>% cccap()%>% cdddmm()%>% cfrange(lon = "decimalLongitude", lat = "decimalLatitude", taxon ="species") ```

Documentation

Pipelines for cleaning data from the Global Biodiversity Information Facility (GBIF) and the Paleobiology Database (PaleobioDB) are available in here.

Contributing

See the CONTRIBUTING document.

Citation

Zizka A, Silvestro D, Andermann T, Azevedo J, Duarte Ritter C, Edler D, Farooq H, Herdean A, Ariza M, Scharn R, Svanteson S, Wengtrom N, Zizka V & Antonelli A (2019) CoordinateCleaner: standardized cleaning of occurrence records from biological collection databases. Methods in Ecology and Evolution, 10(5):744-751, doi:10.1111/2041-210X.13152, https://github.com/ropensci/CoordinateCleaner

ropensci_footer

Owner

  • Name: rOpenSci
  • Login: ropensci
  • Kind: organization
  • Email: info@ropensci.org
  • Location: Berkeley, CA

CodeMeta (codemeta.json)

{
  "@context": [
    "https://doi.org/10.5063/schema/codemeta-2.0",
    "http://schema.org"
  ],
  "@type": "SoftwareSourceCode",
  "identifier": "CoordinateCleaner",
  "description": "Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) <doi:10.1111/2041-210X.13152>.",
  "name": "CoordinateCleaner: Automated Cleaning of Occurrence Records from Biological Collections",
  "codeRepository": "https://github.com/ropensci/CoordinateCleaner",
  "issueTracker": "https://github.com/ropensci/CoordinateCleaner/issues",
  "license": "https://spdx.org/licenses/GPL-3.0",
  "version": "2.0.20",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R Under development (unstable) (2021-10-19 r81077)",
  "provider": {
    "@id": "https://cran.r-project.org",
    "@type": "Organization",
    "name": "Comprehensive R Archive Network (CRAN)",
    "url": "https://cran.r-project.org"
  },
  "author": [
    {
      "@type": "Person",
      "givenName": "Alexander",
      "familyName": "Zizka",
      "email": "zizka.alexander@gmail.com"
    }
  ],
  "contributor": [
    {
      "@type": "Person",
      "givenName": "Daniele",
      "familyName": "Silvestro"
    },
    {
      "@type": "Person",
      "givenName": "Tobias",
      "familyName": "Andermann"
    },
    {
      "@type": "Person",
      "givenName": "Josue",
      "familyName": "Azevedo"
    },
    {
      "@type": "Person",
      "givenName": "Camila",
      "familyName": "Duarte Ritter"
    },
    {
      "@type": "Person",
      "givenName": "Daniel",
      "familyName": "Edler"
    },
    {
      "@type": "Person",
      "givenName": "Harith",
      "familyName": "Farooq"
    },
    {
      "@type": "Person",
      "givenName": "Andrei",
      "familyName": "Herdean"
    },
    {
      "@type": "Person",
      "givenName": "Maria",
      "familyName": "Ariza"
    },
    {
      "@type": "Person",
      "givenName": "Ruud",
      "familyName": "Scharn"
    },
    {
      "@type": "Person",
      "givenName": "Sten",
      "familyName": "Svanteson"
    },
    {
      "@type": "Person",
      "givenName": "Niklas",
      "familyName": "Wengstrom"
    },
    {
      "@type": "Person",
      "givenName": "Vera",
      "familyName": "Zizka"
    },
    {
      "@type": "Person",
      "givenName": "Alexandre",
      "familyName": "Antonelli"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Alexander",
      "familyName": "Zizka",
      "email": "zizka.alexander@gmail.com"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "covr",
      "name": "covr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=covr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "countrycode",
      "name": "countrycode",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=countrycode"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=knitr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "maps",
      "name": "maps",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=maps"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "magrittr",
      "name": "magrittr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=magrittr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "paleobioDB",
      "name": "paleobioDB",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=paleobioDB"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rmarkdown"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rnaturalearthdata",
      "name": "rnaturalearthdata",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rnaturalearthdata"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "viridis",
      "name": "viridis",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=viridis"
    }
  ],
  "softwareRequirements": [
    {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 3.5.0"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "dplyr",
      "name": "dplyr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=dplyr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "geosphere",
      "name": "geosphere",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=geosphere"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "ggplot2",
      "name": "ggplot2",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=ggplot2"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "graphics",
      "name": "graphics"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "grDevices",
      "name": "grDevices"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "methods",
      "name": "methods"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "raster",
      "name": "raster",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=raster"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rgbif",
      "name": "rgbif",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rgbif"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rgeos",
      "name": "rgeos",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rgeos"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rgdal",
      "name": "rgdal",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rgdal"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rnaturalearth",
      "name": "rnaturalearth",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rnaturalearth"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "stats",
      "name": "stats"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "sp",
      "name": "sp",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=sp"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "tidyselect",
      "name": "tidyselect",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=tidyselect"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "utils",
      "name": "utils"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "https://sysreqs.r-hub.io/get/gdal"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "https://sysreqs.r-hub.io/get/libgdal"
    }
  ],
  "releaseNotes": "https://github.com/ropensci/CoordinateCleaner/blob/master/NEWS.md",
  "readme": "https://github.com/ropensci/CoordinateCleaner/blob/master/README.md",
  "fileSize": "130599.254KB",
  "relatedLink": "https://ropensci.github.io/CoordinateCleaner/",
  "developmentStatus": "https://www.repostatus.org/#active",
  "citation": [
    {
      "@type": "ScholarlyArticle",
      "datePublished": "2019",
      "author": [
        {
          "@type": "Person",
          "givenName": "Alexander",
          "familyName": "Zizka"
        },
        {
          "@type": "Person",
          "givenName": "Daniele",
          "familyName": "Silvestro"
        },
        {
          "@type": "Person",
          "givenName": "Tobias",
          "familyName": "Andermann"
        },
        {
          "@type": "Person",
          "givenName": "Josue",
          "familyName": "Azevedo"
        },
        {
          "@type": "Person",
          "givenName": "Camila",
          "familyName": "Duarte Ritter"
        },
        {
          "@type": "Person",
          "givenName": "Daniel",
          "familyName": "Edler"
        },
        {
          "@type": "Person",
          "givenName": "Harith",
          "familyName": "Farooq"
        },
        {
          "@type": "Person",
          "givenName": "Andrei",
          "familyName": "Herdean"
        },
        {
          "@type": "Person",
          "givenName": "Maria",
          "familyName": "Ariza"
        },
        {
          "@type": "Person",
          "givenName": "Ruud",
          "familyName": "Scharn"
        },
        {
          "@type": "Person",
          "givenName": "Sten",
          "familyName": "Svanteson"
        },
        {
          "@type": "Person",
          "givenName": "Niklas",
          "familyName": "Wengstrom"
        },
        {
          "@type": "Person",
          "givenName": "Vera",
          "familyName": "Zizka"
        },
        {
          "@type": "Person",
          "givenName": "Alexandre",
          "familyName": "Antonelli"
        }
      ],
      "name": "CoordinateCleaner: standardized cleaning of occurrence records from biological collection databases",
      "identifier": "10.1111/2041-210X.13152",
      "url": "https://github.com/ropensci/CoordinateCleaner",
      "pagination": "-7",
      "@id": "https://doi.org/10.1111/2041-210X.13152",
      "sameAs": "https://doi.org/10.1111/2041-210X.13152",
      "isPartOf": {
        "@type": "PublicationIssue",
        "issueNumber": "10",
        "datePublished": "2019",
        "isPartOf": {
          "@type": [
            "PublicationVolume",
            "Periodical"
          ],
          "name": "Methods in Ecology and Evolution"
        }
      }
    }
  ],
  "copyrightHolder": {},
  "funder": {},
  "keywords": [
    "r",
    "r-package",
    "rstats"
  ],
  "review": {
    "@type": "Review",
    "url": "https://github.com/ropensci/software-review/issues/210",
    "provider": "https://ropensci.org"
  }
}

GitHub Events

Total
  • Issues event: 3
  • Watch event: 6
  • Issue comment event: 2
  • Push event: 4
  • Fork event: 1
Last Year
  • Issues event: 3
  • Watch event: 6
  • Issue comment event: 2
  • Push event: 4
  • Fork event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 730
  • Total Committers: 17
  • Avg Commits per committer: 42.941
  • Development Distribution Score (DDS): 0.46
Past Year
  • Commits: 5
  • Committers: 2
  • Avg Commits per committer: 2.5
  • Development Distribution Score (DDS): 0.4
Top Committers
Name Email Commits
azizka a****a@b****e 394
azizka z****r@g****m 187
Zizka a****y@i****e 51
BrunoVilela b****a@h****m 29
Pakillo f****c@g****m 28
Irene j****e@e****m 20
Maëlle Salmon m****n@y****e 6
plantarum t****r@p****a 3
Hugo Gruson B****o 2
Jeroen Ooms j****s@g****m 2
mhesselbarth m****h@g****m 2
AMBarbosa A****a 1
John Baumgartner j****s@g****m 1
Michael Sumner m****r@g****m 1
John Waller f****2@s****n 1
Shawn Laffan s****n@g****m 1
Vince Buffalo v****A@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 93
  • Total pull requests: 15
  • Average time to close issues: 4 months
  • Average time to close pull requests: 5 months
  • Total issue authors: 58
  • Total pull request authors: 13
  • Average comments per issue: 2.14
  • Average comments per pull request: 0.33
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 18 hours
  • Issue authors: 3
  • Pull request authors: 1
  • Average comments per issue: 0.5
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jhnwllr (8)
  • azizka (6)
  • jivelasquezt (5)
  • AMBarbosa (4)
  • jaum20 (3)
  • HMB3 (3)
  • maelle (3)
  • wcornwell (3)
  • rvosa (2)
  • rsbivand (2)
  • damariszurell (2)
  • jpstevenson2018 (2)
  • sandro-unibe (2)
  • CyanBC (2)
  • pepbioalerts (2)
Pull Request Authors
  • maelle (2)
  • jhnwllr (2)
  • plantarum (1)
  • shawnlaffan (1)
  • Pakillo (1)
  • vsbuffalo (1)
  • Bisaloo (1)
  • isteves (1)
  • joelnitta (1)
  • mdsumner (1)
  • johnbaums (1)
  • mhesselbarth (1)
  • AMBarbosa (1)
Top Labels
Issue Labels
enhancement (6)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 1,651 last-month
  • Total dependent packages: 5
  • Total dependent repositories: 7
  • Total versions: 15
  • Total maintainers: 1
cran.r-project.org: CoordinateCleaner

Automated Cleaning of Occurrence Records from Biological Collections

  • Versions: 15
  • Dependent Packages: 5
  • Dependent Repositories: 7
  • Downloads: 1,651 Last month
Rankings
Forks count: 3.6%
Stargazers count: 5.3%
Average: 8.6%
Dependent packages count: 10.7%
Dependent repos count: 11.3%
Downloads: 12.2%
Maintainers (1)
Last synced: 7 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • dplyr * imports
  • geosphere * imports
  • ggplot2 * imports
  • grDevices * imports
  • graphics * imports
  • methods * imports
  • raster * imports
  • rgbif * imports
  • rgdal * imports
  • rgeos * imports
  • rnaturalearth * imports
  • sp * imports
  • stats * imports
  • tidyselect * imports
  • utils * imports
  • countrycode * suggests
  • covr * suggests
  • knitr * suggests
  • magrittr * suggests
  • maps * suggests
  • paleobioDB * suggests
  • rmarkdown * suggests
  • rnaturalearthdata * suggests
  • testthat * suggests
  • viridis * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/check-r-package v1 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
  • r-lib/actions/setup-r-dependencies v1 composite
.github/workflows/pkgdown.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
  • r-lib/actions/setup-r-dependencies v1 composite