recotox

REcoTox is a semi-automated, interactive R workflow to process US EPA ECOTOX Knowledgebase entire database ASCII files

https://github.com/tsufz/recotox

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

data-aggregation data-retrieval ecotoxicology ecotoxicology-knowlegdebase enviromental-chemistry hazard-assessment toxic-unit
Last synced: 6 months ago · JSON representation ·

Repository

REcoTox is a semi-automated, interactive R workflow to process US EPA ECOTOX Knowledgebase entire database ASCII files

Basic Info
  • Host: GitHub
  • Owner: tsufz
  • License: other
  • Language: HTML
  • Default Branch: main
  • Homepage:
  • Size: 1.88 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 1
  • Open Issues: 16
  • Releases: 3
Topics
data-aggregation data-retrieval ecotoxicology ecotoxicology-knowlegdebase enviromental-chemistry hazard-assessment toxic-unit
Created almost 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog License Citation

README.md

License: AGPL v3 DOI

Background

The search and extraction of experimental ecotoxicological information is often a tedious work. A good and comprehensive data source is the US EPA ECOTOX Knowledgebase. It contains about 1 million data points for more than 12,000 chemicals and 13,000 single species. However, for a high-throughput hazard assessment, it is not possible to extract all relevant data of the online database The purpose of REcoTox is to extract the relevant information and to aggregate the data based on the user criteria out of the entire database ASCII files.

Introduction

REcoTox is a semi-automated, interactive workflow to process US EPA ECOTOX Knowledgebase entire database ASCII files to extract and process ecotoxicological data relevant (but not restricted) to the ecotoxicity groups algae, crustaceans, and fish in the aquatic domain. The latest version of the ASCII files is available on US EPA ECOTOX Knowledgebase. The focus is aquatic ecotoxicity and the unit of the retrieved data is mg/L.

Requirements

REcoTox expects an R version >4.3.0. Please install additionally the R packages Tidyverse, data_table, EnvStats, and webchem.

Installation

For use of REcoTox, install it from GitHub, please.

To install the latest stable version (0.4.0):

remotes::install_github("tsufz/recotox@0.4.0", build = TRUE, build_manual = TRUE)

To install the latest beta version (main):

remotes::install_github("tsufz/recotox@main", build = TRUE, build_manual = TRUE)

To install the latest development version (dev):

remotes::install_github("tsufz/recotox@dev", build = TRUE, build_manual = TRUE)

Workflow

The file Query_Ecotox_DB.R contains the workflow and loads all relevant packages and functions. The workflows allows to filter for endpoints, measurements, and species. The ecotoxicity data is interactivitely enriched with chemical information (e.g. the average mass). In best case with data linked to US EPA CompTox Chemicals Dashboard for example by using the output of the batch search according to Figure 1 and Figure 2.

Figure1: US EPA CompTox Chemicals Dashboard Batch Search - Enter Identifiers to Search

Figure 2: US EPA CompTox Chemicals Dashboard Batch Search - Recommended selection of identifiers and properties

At least, the molecular weight or average mass is required for the recalculation of the water concentrations from molar to milligrams. The main purpose of this workflow is to generate data for the hazard assessment of chemical pressures to aquatic organisms. Thus, only relevant data is aggregated and all data is calculated to mg/L.

The data output contains long pivot tables containing all filtered datasets as the basis of further data processing and aggregation for the users' purposes. But it includes also a further pivoting step to wider pivot tables containing aggregated information, e.g. the geomean and the 5-percentile of the extracted data for each chemical, endpoint, and species.

Note

This workflow will be further developed. Contributions and suggestions are welcome. Please create an issue to initialize the discussion.

Owner

  • Name: Tobias Schulze
  • Login: tsufz
  • Kind: user
  • Location: Leipzig (Germany
  • Company: Helmholtz Centre for Environmental Research - UFZ

Facilitating open mass spectral information exchange and doing research at Helmholtz Centre for Environmental Research in Leipzig, Germany.

Citation (CITATION.cff)

# -----------------------------------------------------------
# CITATION file created with {cffr} R package, v0.5.0
# See also: https://docs.ropensci.org/cffr/
# -----------------------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "REcoTox" in publications use:'
type: software
title: 'REcoTox: REcoTox - a workflow to process US EPA ECOTOX Knowledgebase ASCII
  files'
version: 0.4.1
abstract: REcoTox is a semi-automated, interactive workflow to process US EPA ECOTOX
  Knowledgebase entire database ASCII files to extract and process ecotoxicological
  data relevant (but not restricted) to the ecotoxicity groups algae, crustaceans,
  and fish in the aquatic domain. The latest version of the ASCII files is available
  on US EPA ECOTOX Knowledgebase. The focus is aquatic ecotoxicity and the unit of
  the retrieved data is mg/L.
authors:
- family-names: Schulze
  given-names: Tobias
  email: tsufz1@gmail.com
  orcid: https://orcid.org/0000-0002-9744-8914
repository: https://bioconductor.org/
date-released: '2023-05-02'
contact:
- family-names: Schulze
  given-names: Tobias
  email: tsufz1@gmail.com
  orcid: https://orcid.org/0000-0002-9744-8914
references:
- type: software
  title: 'R: A Language and Environment for Statistical Computing'
  notes: Depends
  url: https://www.R-project.org/
  authors:
  - name: R Core Team
  location:
    name: Vienna, Austria
  year: '2023'
  institution:
    name: R Foundation for Statistical Computing
  version: '>= 4.3.0'
- type: software
  title: data.table
  abstract: 'data.table: Extension of `data.frame`'
  notes: Imports
  url: https://r-datatable.com
  repository: https://CRAN.R-project.org/package=data.table
  authors:
  - family-names: Dowle
    given-names: Matt
    email: mattjdowle@gmail.com
  - family-names: Srinivasan
    given-names: Arun
    email: asrini@pm.me
  year: '2023'
- type: software
  title: dplyr
  abstract: 'dplyr: A Grammar of Data Manipulation'
  notes: Imports
  url: https://dplyr.tidyverse.org
  repository: https://CRAN.R-project.org/package=dplyr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
    orcid: https://orcid.org/0000-0003-4757-117X
  - family-names: François
    given-names: Romain
    orcid: https://orcid.org/0000-0002-2444-4226
  - family-names: Henry
    given-names: Lionel
  - family-names: Müller
    given-names: Kirill
    orcid: https://orcid.org/0000-0002-1416-3412
  - family-names: Vaughan
    given-names: Davis
    email: davis@posit.co
    orcid: https://orcid.org/0000-0003-4777-038X
  year: '2023'
- type: software
  title: progress
  abstract: 'progress: Terminal Progress Bars'
  notes: Imports
  url: https://github.com/r-lib/progress#readme
  repository: https://CRAN.R-project.org/package=progress
  authors:
  - family-names: Csárdi
    given-names: Gábor
  - family-names: FitzJohn
    given-names: Rich
  year: '2023'
- type: software
  title: purrr
  abstract: 'purrr: Functional Programming Tools'
  notes: Imports
  url: https://purrr.tidyverse.org/
  repository: https://CRAN.R-project.org/package=purrr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
    orcid: https://orcid.org/0000-0003-4757-117X
  - family-names: Henry
    given-names: Lionel
    email: lionel@rstudio.com
  year: '2023'
- type: software
  title: Rdpack
  abstract: 'Rdpack: Update and Manipulate Rd Documentation Objects'
  notes: Imports
  url: https://geobosh.github.io/Rdpack/
  repository: https://CRAN.R-project.org/package=Rdpack
  authors:
  - family-names: Boshnakov
    given-names: Georgi N.
    email: georgi.boshnakov@manchester.ac.uk
  year: '2023'
- type: software
  title: readr
  abstract: 'readr: Read Rectangular Text Data'
  notes: Imports
  url: https://readr.tidyverse.org
  repository: https://CRAN.R-project.org/package=readr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  - family-names: Hester
    given-names: Jim
  - family-names: Bryan
    given-names: Jennifer
    email: jenny@posit.co
    orcid: https://orcid.org/0000-0002-6983-2759
  year: '2023'
- type: software
  title: tibble
  abstract: 'tibble: Simple Data Frames'
  notes: Imports
  url: https://tibble.tidyverse.org/
  repository: https://CRAN.R-project.org/package=tibble
  authors:
  - family-names: Müller
    given-names: Kirill
    email: kirill@cynkra.com
    orcid: https://orcid.org/0000-0002-1416-3412
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  year: '2023'
- type: software
  title: tidyr
  abstract: 'tidyr: Tidy Messy Data'
  notes: Imports
  url: https://tidyr.tidyverse.org
  repository: https://CRAN.R-project.org/package=tidyr
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  - family-names: Vaughan
    given-names: Davis
    email: davis@posit.co
  - family-names: Girlich
    given-names: Maximilian
  year: '2023'
- type: software
  title: utils
  abstract: 'R: A Language and Environment for Statistical Computing'
  notes: Imports
  authors:
  - name: R Core Team
  location:
    name: Vienna, Austria
  year: '2023'
  institution:
    name: R Foundation for Statistical Computing
- type: software
  title: webchem
  abstract: 'webchem: Chemical Information from the Web'
  notes: Imports
  url: https://docs.ropensci.org/webchem/
  repository: https://CRAN.R-project.org/package=webchem
  authors:
  - family-names: Szöcs
    given-names: Eduard
  year: '2023'
- type: software
  title: BiocStyle
  abstract: 'BiocStyle: Standard styles for vignettes and other Bioconductor documents'
  notes: Suggests
  url: https://github.com/Bioconductor/BiocStyle
  repository: https://bioconductor.org/
  authors:
  - family-names: Oleś
    given-names: Andrzej
    orcid: https://orcid.org/0000-0003-0285-2787
  year: '2023'
  doi: 10.18129/B9.bioc.BiocStyle
- type: software
  title: desc
  abstract: 'desc: Manipulate DESCRIPTION Files'
  notes: Suggests
  url: https://github.com/r-lib/desc#readme
  repository: https://CRAN.R-project.org/package=desc
  authors:
  - family-names: Csárdi
    given-names: Gábor
    email: csardi.gabor@gmail.com
  - family-names: Müller
    given-names: Kirill
  - family-names: Hester
    given-names: Jim
    email: james.f.hester@gmail.com
  year: '2023'
- type: software
  title: knitr
  abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
  notes: Suggests
  url: https://yihui.org/knitr/
  repository: https://CRAN.R-project.org/package=knitr
  authors:
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  year: '2023'
- type: software
  title: markdown
  abstract: 'markdown: Render Markdown with ''commonmark'''
  notes: Suggests
  url: https://github.com/rstudio/markdown
  repository: https://CRAN.R-project.org/package=markdown
  authors:
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  - family-names: Allaire
    given-names: JJ
  - family-names: Horner
    given-names: Jeffrey
  year: '2023'
- type: software
  title: rmarkdown
  abstract: 'rmarkdown: Dynamic Documents for R'
  notes: Suggests
  url: https://pkgs.rstudio.com/rmarkdown/
  repository: https://CRAN.R-project.org/package=rmarkdown
  authors:
  - family-names: Allaire
    given-names: JJ
    email: jj@posit.co
  - family-names: Xie
    given-names: Yihui
    email: xie@yihui.name
    orcid: https://orcid.org/0000-0003-0645-5666
  - family-names: Dervieux
    given-names: Christophe
    email: cderv@posit.co
    orcid: https://orcid.org/0000-0003-4474-2498
  - family-names: McPherson
    given-names: Jonathan
    email: jonathan@posit.co
  - family-names: Luraschi
    given-names: Javier
  - family-names: Ushey
    given-names: Kevin
    email: kevin@posit.co
  - family-names: Atkins
    given-names: Aron
    email: aron@posit.co
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  - family-names: Cheng
    given-names: Joe
    email: joe@posit.co
  - family-names: Chang
    given-names: Winston
    email: winston@posit.co
  - family-names: Iannone
    given-names: Richard
    email: rich@posit.co
    orcid: https://orcid.org/0000-0003-3925-190X
  year: '2023'
- type: software
  title: testthat
  abstract: 'testthat: Unit Testing for R'
  notes: Suggests
  url: https://testthat.r-lib.org
  repository: https://CRAN.R-project.org/package=testthat
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2023'

GitHub Events

Total
  • Fork event: 1
Last Year
  • Fork event: 1

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 157
  • Total Committers: 2
  • Avg Commits per committer: 78.5
  • Development Distribution Score (DDS): 0.057
Past Year
  • Commits: 58
  • Committers: 2
  • Avg Commits per committer: 29.0
  • Development Distribution Score (DDS): 0.155
Top Committers
Name Email Commits
Tobias Schulze t****e@u****e 148
Tobias Schulze t****1@g****m 9
Committer Domains (Top 20 + Academic)
ufz.de: 1

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 21
  • Total pull requests: 5
  • Average time to close issues: 21 days
  • Average time to close pull requests: less than a minute
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.57
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 21
  • Pull requests: 5
  • Average time to close issues: 21 days
  • Average time to close pull requests: less than a minute
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 0.57
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tsufz (21)
  • pepijn-devries (1)
Pull Request Authors
  • tsufz (4)
Top Labels
Issue Labels
enhancement (6) bug (3) UX (2) build system (1) documentation (1)
Pull Request Labels