FishGlob_data

Database and methods related to the manuscript "An integrated database of fish biodiversity sampled with scientific bottom trawl surveys"

https://github.com/fishglob/FishGlob_data

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com, zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization fishglob has institutional domain (fishglob.sites.ucsc.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Database and methods related to the manuscript "An integrated database of fish biodiversity sampled with scientific bottom trawl surveys"

Basic Info
  • Host: GitHub
  • Owner: fishglob
  • License: cc-by-4.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 3.23 GB
Statistics
  • Stars: 23
  • Watchers: 5
  • Forks: 8
  • Open Issues: 6
  • Releases: 4
Created over 3 years ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

FishGlob_data

DOI

This repository contains the FishGlob database, including the methods to load, clean, and process 29 publicly available bottom trawl surveys from Europe and North America. This database is a product of the CESAB working group, FishGlob: Fish biodiversity under global change a worldwide assessment from scientific trawl surveys. For more information, please contact fishglobconsortium@gmail.com.

Credit and citation

Our full citation policy is described in the Fishglob_data disclaimer. Briefly, users should cite Maureaud et al. 2021, Maureaud et al. 2024, and relevant primary SBTS sources referenced in the FISHGLOB data files and source data tables of the two Maureaud et al. papers. Users integrating multiple surveys are encouraged to cite additional studies on data integration.

Anyone interested in reusing this data or its outputs should read this readme, our Data Disclaimer, and all survey specific metadata.

CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Accessing final data products

Users can either: - Use the single survey data products in outputs/Cleaned_data/ and work with survey .RData files excluding standardization flags (SURVEYCODE.RData) or including standardization flags (SURVEYCODEstdclean.RData; see Survey data standardization and flags below for more information on flagging); or - Generate a compiled version of the data by running the cleaning_codes/merge.R which will write local versions of the database in outputs/Compiled_data/

Structure of the FishGlob_data repository

  • cleaning_codes includes all scripts to process and perform quality control on the trawl surveys.
  • datadescriptorfigures contains the R script to construct figures 2-4 for the data descriptor manuscript.
  • functions contains useful functions used in other scripts
  • length_weight contains the length-weight relationships for surveys where weights have to be calculated from abundance at length data (including NOR-BTS and DATRAS)
  • metadata_docs has a README with notes about each survey. This is a place to document changes in survey methods, quirks, etc. It is a growing list. If you have information to add, please open an Issue.
  • outputs contains all survey data processed .RData files and flagging outputs
  • QAQC contains the additional QAQC performed on surveys that required supplementary checks (DATRAS-sourced surveys)
  • standard_formats includes definitions of file formats in the FishGlob database, including survey ID codes.
  • standardization_steps contains the R codes to run a full survey standardization and a cross-survey summary of flagging methods
  • summary contains QAQC plots for each survey

Survey data processing steps

Data processing and cleaning is done on a per survey basis unless formats are similar across a group of surveys. The current repository can process 29 scientific bottom-trawl surveys, according to the following steps.

Survey data processing steps 1. Merge the data files for one survey 2. Clean & homogenize column names following the format described in standardformats/fishglobdata_columns.xlsx 3. Create missing columns and standardize units using the standard format standardformats/fishglobdata_columns.xlsx 4. Integrate the cleaned taxonomy by applying the function clean_taxa() and apply expert knowledge on taxonomic treatments 5. Perform quality checks, including the output in the summary folder and specific QAQC for other surveys detailed in the QAQC folder

Survey data standardization and flags

Data standardization and flags are done on a per survey basis and per survey_unit basis (integrating seasons and quarters). Flags are performed both on the temporal occurrence of taxa and the spatio-temporal sampling footprint according to the following steps.

Survey data standardization and flagging steps 1. Taxonomic quality control: run flagspp() for each survey region 2. Apply methods to identify a standard spatial footprint through time for each survey-season/quarter (the surveyunit column). Use the functions applytrimmingpersurveyunitmethod1() and applytrimmingpersurveyunitmethod2() 3. Display and integrate results in the summary files

Author contributions

We thank (in alphabetical order) Esther Beukhof, Danil van Denderen, Daniel Forrest, Alexa Fredston, Zo Kitchel, Laura Mannocci, Aurore Maureaud, Juliano Palacios-Abrantes, Laurene Pecuchet, Malin Pinsky, and Michelle Stuart for their work cleaning, summarizing, merging, standardizing, and providing QAQC on survey data.

Updates policy

The FISHGLOB Steering Committee updates this database approximately once a year, to incorporate additional data from included surveys, and to continually improve the data pipeline. Every year (large) update will represent a new Release (as listed on our releases page - currently #4.) If critical errors are discovered the Steering Committee will update the database as quickly as is logistically feasible. Anyone re-using the FISHGLOB database who wants to request specific changes in future updates is welcome to open a GitHub Issue.

:warning: Important updates :warning:

29/01/2025: We are aware that there are some surveys that currently have 0 values for wgt and num based columns where they should have NAs, as described in issue 47. We recommend that you look closely at the metadata for surveys you're using to see whether a 0 value in a column means 0, or means NA. We are currently working to resolve this issue.

06/05/2024: A warning about CSVs Datasets are available for download in outputs/Cleaned_data/ as .Rdata files. We do not recommend saving FishGlob data in .csv format. For at least some surveys, the haul_id column is composed of a long string of numerics, which is incorrectly rounded if loaded from a .csv programmatically in R (with read_csv() or read.csv()). As documented in issue #49, this leads to errors in the haul_id column, and may occur regardless of the "class" assigned to this column. The most robust way to prevent this error is to write to / read from other data types such as .Rdata or .rds. Packages exist for users to import these into Python and other programming languages.

23/11/2023: FishGlob_data v2.0. This fixes issue #29.

05/09/2023: Norwegian survey is erroneous and will be replaced with a Barents Sea centered survey over 2004-onwards which will change the spatio-temporal coverage of the region (coordinated by Laurene Pecuchet with IMR), see issue #29

Owner

  • Name: FISHGLOB
  • Login: fishglob
  • Kind: organization
  • Email: fishglobconsortium@gmail.com

GitHub Events

Total
  • Issues event: 1
  • Issue comment event: 3
  • Push event: 3
  • Pull request review event: 1
  • Pull request event: 3
  • Create event: 4
Last Year
  • Issues event: 1
  • Issue comment event: 3
  • Push event: 3
  • Pull request review event: 1
  • Pull request event: 3
  • Create event: 4

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 5
  • Average time to close issues: 19 days
  • Average time to close pull requests: about 21 hours
  • Total issue authors: 1
  • Total pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.4
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 5
  • Average time to close issues: 19 days
  • Average time to close pull requests: about 21 hours
  • Issue authors: 1
  • Pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.4
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jepa (1)
Pull Request Authors
  • afredston (2)
  • zoekitchel (2)
  • jepa (1)
Top Labels
Issue Labels
function (1)
Pull Request Labels
function (1)