zoraptera-occurrence-dataset

Zoraptera Occurrence Dataset - curated dataset of global occurrence records of Zoraptera

https://github.com/kalab-oto/zoraptera-occurrence-dataset

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Zoraptera Occurrence Dataset - curated dataset of global occurrence records of Zoraptera

Basic Info
  • Host: GitHub
  • Owner: kalab-oto
  • License: mit
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 5.42 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation Zenodo

README.md

Zoraptera Occurrence Dataset

The dataset is stored in zoraptera_occs.csv file, its based on Darwin Core standard, and is updated manually or by semi-automated workflow described below.

How to cite:

If you use this repository, please cite both the asociated paper and the used version of data/repository itself.

Published data descriptor:

DOI

Kaláb, O., Hoffmannova, J., Packova, G. et al. Curated global occurrence dataset of the insect order Zoraptera. Sci Data 12, 360 (2025). https://doi.org/10.1038/s41597-025-04696-4

Last version of the dataset (1.1.0):

DOI

Note on fields:

Besides fields defined by Darwin Core Standard (https://dwc.tdwg.org/list/), we added five custom fields:

| Field name | Description |
|------------|-------------------------------------------| |zodID | unique id of the record in the dataset | |osmID | id of related OSM geometry | |polygon_fid | id of related polygon in geom.gpkg file | |gbifID | id of related GBIF record | |inatID | id of related iNaturalist record |

[!NOTE] Notice on WKT geometries in footprintWKT

Be aware that spreadsheet processors may have a limited number of characters per cell, and thus may trim values ​​that are too long. This may cause problem with footprintWKT column specifically when the user opens the data in software with a limited number of characters in cell value, then edits the data and saves the file. In such a case, longer WKT text may be truncated and the geometries may be invalidated. However, this will not affect the rest of the dataset and footprintWKT column can be easily restored from the original file, or recalculated runing geom_calc.r, which retrieve WKT geometries from geom.gpkg.

Graphical data summary:

Map of Zoraptera subfamilies Geographical distribution of Zoraptera records in the dataset by subfamily

Histogram of Zoraptera records across years by families Count of Zoraptera records in the dataset across years by family

Dataset update workflow

All updates can be tracked in history of the file, or in commits history in general. Semi-automated updates are tracked in update.log file including date, source, and doi if aplicable.

Manual updates

Simply manual manipulation of zoraptera.csv

Semi-aumtomatized updates

iNaturalist

  • designated person revise identification directly on iNaturalist
  • run script scripts/inat.r which lookup actual iNaturalist data and check if any new identification were done by designated person (now only Petr Kočárek), and if there any, its automatically written to zoraptera_occs.csv, and information about update (date, source) will be written in to logfile update.log

GBIF

  • run script scripts/gbif.r which downloads latest used GBIF dataset with doi red from logfile update.log
  • any new data found will be written to csv file, and information about update (date, source, dataset doi) will be written in to logfile update.log
  • csv file with the new data have to be manually checked and implemented in zoraptera_occs.csv acording to methods published in paper.

Geometry (coordinates) updates

If any new record without coordinates is added to the dataset, the coordinates and positional uncertainty will be obtained following this workflow:

QGIS

  • use OSM place search plugin to find the locality by name, and copy appropriate features to the layer geom/geom.gpkg
  • edit the geometry to represent the locality as close as possible, if the desired place is not the feature but it is related to it, add a new polygon while keeping the original feature attributes. Remove sea or ocean areas with OSM features taged natural=coastlines
  • if the locality is not present in OSM, draw polygon manually
  • simplify the polygon with QGIS Simplify algorithm from geoprocessing toolbox (Visvalingam algorithm, tolerance 100)
  • fill the feature_origin attribute with categories:
    • manual - not related to any OSM feature, manually digitized from the description
    • osm_related - features related to OSM features, not intersecting them but manually digitized based on them
    • osm_derived - features derived from OSM features, features intersecting each other
    • osm_exact - features that are exact copies of OSM features
  • polygon geometry can be edited in geom/geom.gpkg (e.g. polygon site improvement or adding new polygons)

R

  • after any geom/geom.gpkg edit, the coordinates and positional uncertainty should be recalculated with scripts/geom_calc.r to write changes to zoraptera_occs.csv. Runnig scripts/geom_calc.r also recalculate the WKT geometries for footprintWKT column in the zoraptera_occs.csv dataset.

[!NOTE] This MIT licence apply on repository excluding single data records in the dataset. The license for each entry (if applicable) is listed in column licence of the zoraptera_occs.csv dataset and may be incompatible with MIT. This research was supported by the Grant Agency of the Czech Republic (project No. 22-05024S; Evolution of angel insects (Zoraptera): from fossils and comparative morphology to cytogenetics and transcriptomes).

Owner

  • Name: Oto Kaláb
  • Login: kalab-oto
  • Kind: user
  • Location: Czechia
  • Company: Department of Physical Geography and Geoecology / University of Ostrava & @GISMentors / @OpenGeoLabs

ecology - GIS - spatial ecology - biogeography - orthoptera

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repository, please cite both the article from preferred-citation and this dataset repository itself."
authors:
- family-names: Kaláb
  given-names: Oto
  affiliation: Department of Physical Geography and Geoecology, Faculty of Science, University of Ostrava,
  orcid: 0000-0003-3485-9377
- family-names: Hoffmannova
  given-names: Johana
  affiliation: Department of Zoology, Faculty of Science, Palacky University,
  orcid: 0000-0003-0216-6031
- family-names: Packova
  given-names: Gabriela
  affiliation: Department of Zoology, Faculty of Science, Palacky University,
  orcid: 0000-0001-7949-619X
- family-names: Kočárková
  given-names: Ivona
  affiliation: Department of Biology and Ecology, Faculty of Science, University of Ostrava,
  orcid: 0000-0002-8942-9481
- family-names: Kundrata
  given-names: Robin
  affiliation: Department of Zoology, Faculty of Science, Palacky University,
  orcid: 0000-0001-9397-1030
- family-names: Kočárek
  given-names: Petr
  affiliation: Department of Biology and Ecology, Faculty of Science, University of Ostrava,
  orcid: 0000-0002-1739-0143
preferred-citation:
  authors:
    - family-names: Kaláb
      given-names: Oto
      affiliation: Department of Physical Geography and Geoecology, Faculty of Science, University of Ostrava,
      orcid: 0000-0003-3485-9377
    - family-names: Hoffmannova
      given-names: Johana
      affiliation: Department of Zoology, Faculty of Science, Palacky University,
      orcid: 0000-0003-0216-6031
    - family-names: Packova
      given-names: Gabriela
      affiliation: Department of Zoology, Faculty of Science, Palacky University,
      orcid: 0000-0001-7949-619X
    - family-names: Kočárková
      given-names: Ivona
      affiliation: Department of Biology and Ecology, Faculty of Science, University of Ostrava,
      orcid: 0000-0002-8942-9481
    - family-names: Kundrata
      given-names: Robin
      affiliation: Department of Zoology, Faculty of Science, Palacky University,
      orcid: 0000-0001-9397-1030
    - family-names: Kočárek
      given-names: Petr
      affiliation: Department of Biology and Ecology, Faculty of Science, University of Ostrava,
      orcid: 0000-0002-1739-0143
  title: "Curated global occurrence dataset of the insect order Zoraptera"
  type: article
  database: DOI.org (Crossref)
  issn: 2052-4463
  issue: 1
  journal: Sci Data
  languages: en
  pages: 360
  volume: 12
  url: https://www.nature.com/articles/s41597-025-04696-4
  date-published: 2025-02-28
  identifiers: 
    - type: doi
      value: 10.1038/s41597-025-04696-4
title: "Zoraptera Occurrence Dataset"
version: 1.1.0
doi: 
date-released: 
type: dataset

GitHub Events

Total
  • Release event: 1
  • Watch event: 2
  • Push event: 25
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 2
  • Push event: 25
  • Create event: 1