Ecological Metadata as Linked Data

Ecological Metadata as Linked Data - Published in JOSS (2019)

https://github.com/ropensci/emld

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords from Contributors

data-store git-lfs weather-data
Last synced: 4 months ago · JSON representation

Repository

:package: JSON-LD representation of EML

Basic Info
Statistics
  • Stars: 14
  • Watchers: 7
  • Forks: 6
  • Open Issues: 10
  • Releases: 4
Created about 8 years ago · Last pushed almost 5 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct Codemeta

README.Rmd

---
output: github_document
---

[![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)
[![Travis-CI Build Status](https://travis-ci.org/ropensci/emld.svg?branch=master)](https://travis-ci.org/ropensci/emld) 
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/cboettig/emld?branch=master&svg=true)](https://ci.appveyor.com/project/cboettig/emld)
[![Coverage Status](https://img.shields.io/codecov/c/github/ropensci/emld/master.svg)](https://codecov.io/github/ropensci/emld?branch=master)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/emld)](https://cran.r-project.org/package=emld)
[![](https://badges.ropensci.org/269_status.svg)](https://github.com/ropensci/software-review/issues/269)
[![DOI](https://zenodo.org/badge/108223439.svg)](https://zenodo.org/badge/latestdoi/108223439)
[![DOI](http://joss.theoj.org/papers/10.21105/joss.01276/status.svg)](https://doi.org/10.21105/joss.01276)



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
```

# emld

The goal of emld is to provide a way to work with EML metadata in the JSON-LD format. At it's heart, the package is simply a way to translate an EML XML document into JSON-LD and be able to reverse this so that any semantically equivalent JSON-LD file can be serialized into EML-schema valid XML.  The package has only three core functions:

- `as_emld()` Convert EML's `xml` files (or the `json` version created by this package) into a native R object (an S3 class called `emld`, essentially just a `list`).
- `as_xml()` Convert the native R format, `emld`, back into XML-schema valid EML.
- `as_json()` Convert the native R format, `emld`, into `json`(LD).


## Installation

You can install emld from github with:

```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("ropensci/emld")
```



## Motivation


In contrast to the existing [EML package](https://docs.ropensci.org/EML/), this package aims to a very light-weight implementation that seeks to provide both an intuitive data format and make maximum use of existing technology to work with that format.  In particular, this package emphasizes tools for working with linked data through the JSON-LD format.  This package is not meant to replace `EML`, as it does not support the more complex operations found in that package.  Rather, it provides a minimalist but powerful way of working with EML documents that can be used by itself or as a backend for those complex operations.  Version 2.0 of the EML R package uses `emld` under the hood.  

Note that the JSON-LD format is considerably less rigid than the EML schema.  This means that there are many valid, semantically equivalent representations on the JSON-LD side that must all map into the same or nearly the same XML format.  At the extreme end, the JSON-LD format can be serialized into RDF, where everything is flat set of triples (e.g. essentially a tabular representation), which we can query directly with semantic tools like SPARQL, and also automatically coerce back into the rigid nesting and ordering structure required by EML.  This ability to "flatten" EML files can be particularly convenient for applications consuming and parsing large numbers of EML files.  This package may also make it easier for other developers to build on the EML, since the S3/list and JSON formats used here have proven more appealing to many R developers than S4 and XML serializations.


```{r}
library(emld)
library(jsonlite)
library(magrittr) # for pipes
library(jqr)      # for JQ examples only
library(rdflib)   # for RDf examples only

```


## Reading EML

The `EML` package can get particularly cumbersome when it comes to extracting and manipulating existing metadata in highly nested EML files.  The `emld` approach can leverage a rich array of tools for reading, extracting, and manipulating existing EML files. 

We can parse a simple example and manipulate is as a familiar list object (S3 object):

```{r}
f <- system.file("extdata/example.xml", package="emld")
eml <- as_emld(f)
eml$dataset$title
```


## Writing EML

Because `emld` objects are just nested lists, we can create EML just by writing lists: 

```{r}

me <- list(individualName = list(givenName = "Carl", surName = "Boettiger"))

eml <- list(dataset = list(
              title = "dataset title",
              contact = me,
              creator = me),
              system = "doi",
              packageId = "10.xxx")

ex.xml <- tempfile("ex", fileext = ".xml") # use your preferred file path

as_xml(eml, ex.xml)
eml_validate(ex.xml)
```

Note that we don't have to worry about the order of the elements here, `as_xml` will re-order if necessary to validate. (For instance, in valid EML the `creator` becomes listed before `contact`.)   Of course this is a very low-level interface that does not help the user know what an EML looks like. Creating EML from scratch without knowledge of the schema is a job for the `EML` package and beyond the scope of the lightweight `emld`.  


# Working with EML as JSON-LD 

For many applications, it is useful to merely treat EML as a list object, as seen above, allowing the R user to leverage a standard tools and intuition in working with these files.  However, `emld` also opens the door to new possible directions by thinking of EML data in terms of a JSON-LD serialization rather than an XML serialization.  First, owing to it's comparative simplicity and native data typing (e.g. of Boolean/string/numeric data), JSON is often easier for many developers to work with than EML's native XML format.  


## As JSON: Query with JQ 

For example, JSON can be queried with with JQ, a [simple and powerful query language](https://stedolan.github.io/jq/manual/) that also gives us a lot of flexibility over the return structure of our results.  JQ syntax is both intuitive and well documented, and often easier than the typical munging of JSON/list data using `purrr`.  Here's an example query that turns EML to JSON and then extracts the north and south bounding coordinates:


```{r}
hf205 <- system.file("extdata/hf205.xml", package="emld")

as_emld(hf205) %>% 
  as_json() %>% 
  jq('.dataset.coverage.geographicCoverage.boundingCoordinates | 
       { northLat: .northBoundingCoordinate, 
         southLat: .southBoundingCoordinate }') %>%
  fromJSON()
```

Nice features of JQ include the ability to do recursive descent (common to XPATH but not possible in `purrr`) and specify the shape of the return object.  Some prototype examples of how we can use this to translate between EML and  representations of the same metadata can be found in 



## As semantic data: SPARQL queries


Another side-effect of the JSON-LD representation is that we can treat EML as "semantic" data.  This can provide a way to integrate EML records with other data sources, and means we can query the EML using semantic SPARQL queries.  One nice thing about SPARQL queries is that, in contrast to XPATH, JQ, or other graph queries, SPARQL always returns a `data.frame` which is a particularly convenient format. SPARQL queries look like SQL queries in that we name the columns we want with a `SELECT` command.  Unlike SQL, these names act as variables.  We then use a WHERE block to define how these variables relate to each other.  


```{r}
f <- system.file("extdata/hf205.xml", package="emld")
hf205.json <- tempfile("hf205", fileext = ".json") # Use your preferred filepath

as_emld(f) %>%
  as_json(hf205.json)

prefix <- paste0("PREFIX eml: \n")
sparql <- paste0(prefix, '

  SELECT ?genus ?species ?northLat ?southLat ?eastLong ?westLong 

  WHERE { 
    ?y eml:taxonRankName "genus" .
    ?y eml:taxonRankValue ?genus .
    ?y eml:taxonomicClassification ?s .
    ?s eml:taxonRankName "species" .
    ?s eml:taxonRankValue ?species .
    ?x eml:northBoundingCoordinate ?northLat .
    ?x eml:southBoundingCoordinate ?southLat .
    ?x eml:eastBoundingCoordinate ?eastLong .
    ?x eml:westBoundingCoordinate ?westLong .
  }
')
  
rdf <- rdf_parse(hf205.json, "jsonld")
df <- rdf_query(rdf, sparql)
df
```






----

Please note that the `emld` project is released with a [Contributor Code of Conduct](https://docs.ropensci.org/emld/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.


[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Owner

  • Name: rOpenSci
  • Login: ropensci
  • Kind: organization
  • Email: info@ropensci.org
  • Location: Berkeley, CA

JOSS Publication

Ecological Metadata as Linked Data
Published
February 26, 2019
Volume 4, Issue 34, Page 1276
Authors
Carl Boettiger ORCID
University of California, Berkeley
Editor
Bruce E. Wilson ORCID
Tags
linked data EML JSON-LD RDF Ecology

CodeMeta (codemeta.json)

{
  "@context": [
    "https://doi.org/10.5063/schema/codemeta-2.0",
    "http://schema.org"
  ],
  "@type": "SoftwareSourceCode",
  "identifier": "emld",
  "description": "This is a utility for transforming Ecological Metadata Language\n        ('EML') files into 'JSON-LD' and back into 'EML.'  Doing so creates a\n        list-based representation of 'EML' in R, so that 'EML' data can easily\n        be manipulated using standard 'R' tools. This makes this package an\n        effective backend for other 'R'-based tools  working with 'EML.' By\n        abstracting away the complexity of 'XML' Schema, developers can\n        build around native 'R' list objects and not have to worry about satisfying\n        many of the additional constraints of set by the schema (such as element\n        ordering, which is handled automatically). Additionally, the 'JSON-LD' \n        representation enables the use of developer-friendly 'JSON' parsing and\n        serialization that may facilitate the use of 'EML' in contexts outside of 'R,'\n        as well as the informatics-friendly serializations such as 'RDF' and\n        'SPARQL' queries.",
  "name": "emld: Ecological Metadata as Linked Data",
  "codeRepository": "https://github.com/ropensci/emld",
  "issueTracker": "https://github.com/ropensci/emld/issues",
  "license": "https://spdx.org/licenses/MIT",
  "version": "0.2.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "version": "3.5.2",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 3.5.2 (2018-12-20)",
  "author": [
    {
      "@type": "Person",
      "givenName": "Carl",
      "familyName": "Boettiger",
      "email": "cboettig@gmail.com",
      "@id": "http://orcid.org/0000-0002-1642-628X"
    }
  ],
  "copyrightHolder": [
    {
      "@type": "Person",
      "givenName": "Carl",
      "familyName": "Boettiger",
      "email": "cboettig@gmail.com",
      "@id": "http://orcid.org/0000-0002-1642-628X"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Carl",
      "familyName": "Boettiger",
      "email": "cboettig@gmail.com",
      "@id": "http://orcid.org/0000-0002-1642-628X"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "spelling",
      "name": "spelling",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=spelling"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "magrittr",
      "name": "magrittr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=magrittr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rmarkdown"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "covr",
      "name": "covr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=covr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=knitr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rdflib",
      "name": "rdflib",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rdflib"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "jqr",
      "name": "jqr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=jqr"
    }
  ],
  "softwareRequirements": [
    {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 3.1.0"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "xml2",
      "name": "xml2",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=xml2"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "jsonlite",
      "name": "jsonlite",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=jsonlite"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "jsonld",
      "name": "jsonld",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=jsonld"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "methods",
      "name": "methods"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "yaml",
      "name": "yaml",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=yaml"
    }
  ],
  "releaseNotes": "https://github.com/cboettig/emld/blob/master/NEWS.md",
  "readme": "https://github.com/ropensci/emld/blob/master/README.md",
  "fileSize": "652.48KB",
  "contIntegration": [
    "https://travis-ci.org/ropensci/emld",
    "https://ci.appveyor.com/project/cboettig/emld",
    "https://codecov.io/github/ropensci/emld?branch=master"
  ],
  "developmentStatus": "https://www.tidyverse.org/lifecycle/#maturing",
  "citation": [
    {
      "@type": "ScholarlyArticle",
      "datePublished": "2019",
      "author": [
        {
          "@type": "Person",
          "givenName": "Carl",
          "familyName": "Boettiger",
          "email": "cboettig@gmail.com",
          "@id": "http://orcid.org/0000-0002-1642-628X"
        }
      ],
      "name": "Ecological Metadata as Linked Data. Journal of Open Source Software",
      "identifier": "10.21105/joss.01276",
      "url": "https://doi.org/10.21105/joss.01276",
      "paginiation": "1276",
      "@id": "https://doi.org/10.21105/joss.01276",
      "sameAs": "https://doi.org/10.21105/joss.01276",
      "isPartOf": {
        "@type": "PublicationIssue",
        "issueNumber": "34",
        "datePublished": "2019",
        "isPartOf": {
          "@type": [
            "PublicationVolume",
            "Periodical"
          ],
          "volumeNumber": "4",
          "name": "The Journal of Open Source Software"
        }
      }
    }
  ]
}

GitHub Events

Total
  • Issues event: 1
Last Year
  • Issues event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 203
  • Total Committers: 6
  • Avg Commits per committer: 33.833
  • Development Distribution Score (DDS): 0.148
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Carl Boettiger c****g@g****m 173
Bryce Mecum p****h@g****m 20
Matt Jones g****e@m****g 6
Jeroen Ooms j****s@g****m 2
Jeanette j****k@n****u 1
ropenscibot m****t@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 43
  • Total pull requests: 29
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 2 days
  • Total issue authors: 8
  • Total pull request authors: 5
  • Average comments per issue: 1.93
  • Average comments per pull request: 1.72
  • Merged pull requests: 27
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • cboettig (20)
  • amoeba (13)
  • jeanetteclark (3)
  • isteves (2)
  • mbjones (2)
  • peterdesmet (1)
  • thesadie (1)
  • ajpelu (1)
Pull Request Authors
  • amoeba (13)
  • cboettig (11)
  • mbjones (3)
  • isteves (1)
  • jeanetteclark (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 758 last-month
  • Total dependent packages: 6
    (may contain duplicates)
  • Total dependent repositories: 14
    (may contain duplicates)
  • Total versions: 10
  • Total maintainers: 1
cran.r-project.org: emld

Ecological Metadata as Linked Data

  • Versions: 5
  • Dependent Packages: 5
  • Dependent Repositories: 14
  • Downloads: 758 Last month
Rankings
Dependent repos count: 7.7%
Dependent packages count: 8.2%
Forks count: 9.6%
Average: 11.1%
Stargazers count: 15.1%
Downloads: 15.1%
Maintainers (1)
Last synced: 4 months ago
conda-forge.org: r-emld
  • Versions: 5
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Dependent packages count: 28.8%
Average: 31.4%
Dependent repos count: 34.0%
Last synced: 4 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.1.0 depends
  • jsonld * imports
  • jsonlite * imports
  • methods * imports
  • xml2 * imports
  • yaml * imports
  • covr * suggests
  • jqr * suggests
  • knitr * suggests
  • magrittr * suggests
  • rdflib * suggests
  • rmarkdown * suggests
  • spelling * suggests
  • testthat * suggests