https://github.com/cthoyt/biocontext

JSON-LD Contexts for Bioinformatics Data

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: ncbi.nlm.nih.gov, science.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

JSON-LD Contexts for Bioinformatics Data

Basic Info

Host: GitHub
Owner: cthoyt
Default Branch: master
Size: 184 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of prefixcommons/biocontext

Created about 4 years ago · Last pushed over 4 years ago

https://github.com/cthoyt/biocontext/blob/master/

[![Build Status](https://travis-ci.org/cmungall/biocontext.svg?branch=master)](https://travis-ci.org/cmungall/biocontext)

## BioContext: JSON-LD Contexts for Bioinformatics Data

The goal is to provide a modular set of [JSON-LD](http://json-ld.org/)
contexts for mapping abbreviated names of biological objects onto URIs
for use in semantic web tool chains. Here, "abbreviated name" usually
means a [CURIE](https://en.wikipedia.org/wiki/CURIE) but optionally
human-friendly symbolic names (e.g. `gene`) can also be used as
abbrevations for complete URIs (although this is more dangerous).

A CURIE is a bipartite identifier of the form `Prefix:LocalID`, in
which the prefix is a convenient abbreviation of a URI prefix. CURIEs
in JSON-LD documents are expanded to URIs, if that prefix is defined
in a `@context` object.

Note that you don't need to be using JSON-LD to find this
useful. Bipartite unique IDs are common in bioinformatics, and
mandated in formats such as OBO (the most common way of consuming
ontologies in bioinformatics toolchains).

There are many situations where it's necessary to translate a
bioinformatics ID to URI for use in the semantic web stack. This
includes the [SciGraph](https://github.com/SciGraph/SciGraph) Neo4j
application as well as triplestores, OWL tooling (ROBOT), standard
prefixes for SPARQL queries, etc.

Here are some examples of expansions from abbreviated names to URIs
using these contexts:

 * Ontology class CURIEs
    * GO:0006915 ==> http://purl.obolibrary.org/obo/GO_0006915
    * CHEBI:26619 ==> http://purl.obolibrary.org/obo/CHEBI_26619
    * UBERON:0001685 ==> http://purl.obolibrary.org/obo/UBERON_0001685
    * NCBITaxon:9606 ==> http://purl.obolibrary.org/obo/NCBITaxon_9606
 * Databases CURIEs
    * ENSEMBL:ENSG00000123374 ==> http://identifiers.org/ensembl/ENSG00000123374
    * FlyBase:FBgn0011293 ==> http://identifiers.org/flybase/FBgn0011293
 * Literature CURIEs
    * PMID:16516152 ==> http://www.ncbi.nlm.nih.gov/pubmed/16516152
    * PMCID:PMC3178059 ==> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178059
    * DOI:10.1371/journal.pone.0015506 ==> http://dx.doi.org/10.1371/journal.pone.0015506
 * Metadata CURIEs
    * dc:title ==> http://purl.org/dc/terms/title
    * xsd:Int ==> http://www.w3.org/2001/XMLSchema#Int
    * owl:Class ==> http://www.w3.org/2002/07/owl#Class
    * foaf:is_about ==> http://xmlns.com/foaf/0.1/is_about
 * Abbreviated non-CURIE names
    * is_about ==> http://xmlns.com/foaf/0.1/is_about
    * part_of ==> http://purl.obolibrary.org/obo/BFO_0000050
    * assay ==> http://purl.obolibrary.org/obo/OBI_0000070
    * Association ==> http://semanticscience.org/resource/SIO_000897

The contexts are modular and remixable; for example, if you want to
use the [OBO Library](http://obofoundry.org) purls for ontology class
CURIEs you can reference [obo_context.json](registry/obo_context.jsonld), but you are free to ignore
the commitment to map ENSMEBL etc to `identifiers.org` URIs.

## Organization

The project is organized as a set of JSON-LD context files in the
[registry/](registry) directory. The current set is preliminary and
unstable.

Different contexts can be concatenated together. Warning: there is a
possibility that combinations can confict. See the scripts in
[bin](bin/) for concatenating and subtracting.

The current list is:

 * [obo](registry/obo_context.jsonld) : derived from the OBO registry
 * [idot](registry/idot_context.jsonld) : derived from identifiers-org/MIRIAM registry
 * [idot_nr](registry/idot_nr_context.jsonld) : idot minus OBO
 * [goxrefs](registry/goxrefs_context.jsonld) : derived from http://amigo.geneontology.org/xrefs
 * [semweb](registry/semweb_context.jsonld) : Standard semantic web prefixes
 * [commons](registry/commons_context.jsonld) : The commons set: OBO + idot_nr + monarch

## Clash Reporting

 * [reports/clashes.txt](reports/clashes.txt)

## Use in JSON-LD documents

You can simply copy the portions of the contexts files here to use in
your own JSON-LD documents.

When this project is more stable, you can reference any of the
contexts over the web.

For testing purposes you can do this for now:

```
 {
   "@context", "https://raw.githubusercontent.com/cmungall/biocontext/master/registry/obo_context.jsonld"
   ...
```

*but this is not stable*

## Examples

TODO

## Remixing your own contexts

TODO - provide links to JSON-LD scripts

## Philosophy

When mapping an OBO-style ID there is no ambiguity as to what to map
to. The ID "CHEBI:26619" corresponds to the OWL class with IRI
"http://purl.obolibrary.org/obo/CHEBI_26619". 

However, when presented with something like "OMIM:224050" or
"ENSEMBL:ENSG00000123374", what should the interpretation of these be
when we refer to them from within the semantic web? Are these
information artefacts *about* a biological entity, or are they
biological entities themselves? If they are biological entities, is a
gene an individual or a class?

This registry provides a pluralistic approach. The default is to map a
database ID to an identifiers.org URI, which makes no ontological
commitments to the nature of the entity. This does not preclude the
possibility of including separate mappings to ontologically committed
OWL objects. For example, one group may use to use CURIEs of the form
"OMIM:224050" as an abbreviation for an OWL Class URI. There is no
mandate in the semantic web that all groups must use the same
CURIEs. However, to avoid confusion groups should in general
coordinate for example through obo-discuss regarding different
ontological interpretations of database objects.

Note this is already built in to some extent with some databases such
as the NCBITaxonomy. The OBO Library uses the NCBITaxon prefix for a
class-based mirror of the ncbi taxonomy database. For example
"NCBITaxon:9606" is a shorthand for
http://purl.obolibrary.org/obo/NCBITaxon_9606

## What this does not do

The scope of biocontext is limited to mapping of prefixes and short
names to URIs. It is not a general purpose registry. It stores no
metadata about the prefixes used. It reuses mappings from other
registries such as identifiers.org and the OBO library when possible.

PrefixCommons separately harmonizes additional identifier metadata (beyond the mappings alone); this metadata harmonization is instead done in the [data ingest repository](https://github.com/prefixcommons/data-ingest). The sources for prefix metadata are primarily Identifiers.org, Bio2RDF registry, the OBO foundry, and BioPortal.

## Contributing

 * [new issue](issues/new)
 * Fork, branch, make a pull request
 * Edit any file directly via github web interface and make a pull request

Owner

Name: Charles Tapley Hoyt
Login: cthoyt
Kind: user
Location: Bonn, Germany
Company: RWTH Aachen University

Website: https://cthoyt.com
Repositories: 489
Profile: https://github.com/cthoyt

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cthoyt/biocontext

Science Score: 23.0%

Repository

Basic Info

Statistics

https://github.com/cthoyt/biocontext/blob/master/

Owner

GitHub Events

Total

Last Year