string2vocabulary

Substitute literals in RDF graphs with URIs from SKOS vocabularies

https://github.com/doremus-anr/string2vocabulary

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.3%) to scientific vocabulary

Keywords

rdf semantic-web skos string-matching uri vocabularies
Last synced: 6 months ago · JSON representation ·

Repository

Substitute literals in RDF graphs with URIs from SKOS vocabularies

Basic Info
  • Host: GitHub
  • Owner: DOREMUS-ANR
  • License: apache-2.0
  • Language: Java
  • Default Branch: master
  • Homepage:
  • Size: 1.17 MB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 1
  • Open Issues: 1
  • Releases: 10
Topics
rdf semantic-web skos string-matching uri vocabularies
Created over 8 years ago · Last pushed about 3 years ago
Metadata Files
Readme License Citation

README.md

String2Vocabulary

Look for literals in an RDF graph and substitute them with URIs from controlled vocabularies. Built with Gradle and Apache Jena.

It uses the vocabulary filenames for grouping them in families. For example, city-italy.ttl and city-france.ttl are part of the family city.

Input

The library needs in input: - a folder containing vocabularies - for a full graph replacement, a configuration in csv that declares the property to match, the relative vocabulary family, an if it should eventually check for the singular version of the label. Example:

csv http://data.doremus.org/ontology#U2_foresees_use_of_medium_of_performance,mop,singular http://data.doremus.org/ontology#U11_has_key,key,

Given as input: turtle ns:myMusicWork mus:U11_has_key [ a mus:M4_Key ; rdfs:label "Ré majeur"@fr ] ; mus:U2_foresees_use_of_medium_of_performance "mezzosoprano" .

... this produces as output:

turtle ns:myMusicWork mus:U11_has_key <http://data.doremus.org/vocabulary/key/d> ; mus:U2_foresees_use_of_medium_of_performance <http://data.doremus.org/vocabulary/iaml/mop/vms> .

Features

  • Vocabulary syntax supported:
    • SKOS
    • MODS
  • Support for families of vocabularies
  • Replace literals that match the given label
  • Replace objects that have a rdfs:label or ecrm:P1_is_identified_by which match the given label
  • Strict mode: match both label and language
  • Normalise the labels by removing punctuation, decoding to ASCII, using lowercase
  • Search also for the singular version of the word with Stanford CoreNLP
  • Support for RDF Dataset:
    • replace content at the default graph level
    • replace content at a given named graph level
  • Supported textual syntax for RDF (serialization):

Dependencies: * Build tool: Gradle 7+ * See the dependencies section in the build.gradle file for project dependencies.

Usage

As a module

  1. Add it as dependency. E.g. in build.gradle:

dependencies { compile 'com.github.DOREMUS-ANR:string2vocabulary:0.7' }

  1. Import and init in your Java class

```java import org.doremus.string2vocabulary.VocabularyManager;

// ...

// print full logs VocabularyManager.setVerbose(true);

// set the folder where to find vocabuaries VocabularyManager.setVocabularyFolder("/location/to/vocabularyFolder"); // set the folder where to find the config csv VocabularyManager.init("/location/to/property2family.csv"); // set the language to be used for singularising the words VocabularyManager.setLang("fr"); ```

  1. Use it :)

```java // Search for a term in a given family // this performs a normal full search and one in strict mode VocabularyManager.searchInCategory("violin", "en", "mop"); // --> http://www.mimo-db.eu/InstrumentsKeywords/3573

// or // Search for a term in a given vocabulary VocabularyManager.getVocabulary("mop-iaml").findConcept("violin", false); // --> http://data.doremus.org/vocabulary/iaml/mop/svl // strict mode VocabularyManager.getVocabulary("mop-iaml").findConcept("violin@it", true); // --> null

// or // Get the URI by code (what is written after the namespace) VocabularyManager.getVocabulary("key").getConcept("dm"); // --> http://data.doremus.org/vocabulary/key/dm

// or // Full graph replacement // search and substitute in the whole Jena Model // (following the csv configuration) VocabularyManager.string2uri(model) ```

See the test folder for another example of usage.

Command Line

Run the library from CLI with gradle run: ```shell

Canonical form

gradle run -Pmap="/location/to/property2family.csv" \ -Pinput="/location/to/input.ttl" \ -Pvocabularies="/location/to/vocabularyFolder" ```

Available CLI parameters:

| param | example | comment | | ----- | ------- | ------- | | map | /location/to/property2family.csv | A table with mapping property-vocabulary | | vocabularies | /location/to/vocabularyFolder | Folder containing the vocabularies in turtle format | | input | /location/to/input.ttl | The input file (Turtle or TriG syntax) | | output (Optional) | /location/to/output.ttl | The output turtle file. Default: <inputPath/inputName>_output.<inputFileExt> | | lang (Optional) | fr | Language to be used for singularising the words. Default: en. | | graph (Optional) | http://example.org/graph/object/ | The named graph to process. Default: `` (i.e. the default graph) |

Default gradle run behavior rely on project properties set in the gradle.properties file. See the following links for details about properties in Gradle: * Passing Command Line Arguments in Gradle * Gradle project properties best practices * Configuring Gradle with "gradle.properties"

CLI examples with provided test files:

```shell

Example: Turtle syntax

gradle run -Pmap="src/test/resources/property2family.csv" \ -Pinput="src/test/resources/input.ttl" \ -Pvocabularies="src/test/resources/vocabulary"

Example: TriG syntax, replace at the default graph level

gradle run -Pmap="src/test/resources/property2family.csv" \ -Pinput="src/test/resources/input.trig" \ -Poutput="src/test/resources/output.trig" \ -Pvocabularies="src/test/resources/vocabulary"

Example: TriG syntax, replace at the default graph level (alternative)

gradle run -Pmap="src/test/resources/property2family.csv" \ -Pinput="src/test/resources/input.trig" \ -Poutput="src/test/resources/output.trig" \ -Pvocabularies="src/test/resources/vocabulary" \ -Pgraph=""

Example: TriG syntax, replace at a given named graph level

gradle run -Pmap="src/test/resources/property2family.csv" \ -Pinput="src/test/resources/input.trig" \ -Poutput="src/test/resources/output.trig" \ -Pvocabularies="src/test/resources/vocabulary" \ -Pgraph="http://example.org/graph/object/" ```

Documentation

Generating local code documentation:

shell javadoc -d doc/ ./org/doremus/string2vocabulary/VocabularyManager.java

References:

  • https://www.tutorialspoint.com/java/java_documentation.htm

Contribute

In the general case, please * fork and create merge request OR * raise an issue into the project's space.

Citation

If you use this software in a scientific publication, please cite:

``` Pasquale Lisena, Konstantin Todorov, Cécile Cecconi, Françoise Leresche, Isabelle Canno, Frédéric Puyrenier, Martine Voisin, Thierry Le Meur, & Raphaël Troncy. (2018). Controlled Vocabularies for Music Metadata. Proceedings of the 19th International Society for Music Information Retrieval Conference, 424–430. https://doi.org/10.5281/zenodo.1492441

```

In BibTex:

@inproceedings{lisena2018vocabularies, author = {Pasquale Lisena and Konstantin Todorov and Cécile Cecconi and Françoise Leresche and Isabelle Canno and Frédéric Puyrenier and Martine Voisin and Thierry Le Meur and Raphaël Troncy}, title = {Controlled Vocabularies for Music Metadata}, booktitle = {{19th International Society for Music Information Retrieval Conference}}, year = 2018, pages = {424-430}, publisher = {ISMIR}, address = {Paris, France}, month = sep, venue = {Paris, France}, doi = {10.5281/zenodo.1492441}, url = {https://doi.org/10.5281/zenodo.1492441} }

Owner

  • Name: DOREMUS ANR Project
  • Login: DOREMUS-ANR
  • Kind: organization
  • Email: doremus@googlegroups.com
  • Location: France

This is the organization corresponding to the ANR funded project named DOREMUS (2014-1017)

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "String2Vocabulary"
url: https://github.com/DOREMUS-ANR/string2vocabulary
date-released: 2018-09-23
type: software
preferred-citation:
  type: conference-paper
  authors:
  - family-names: "Lisena"
    given-names: "Pasquale"
    orcid: "https://orcid.org/0000-0003-3094-5585"
    email: pasquale.lisena@eurecom.fr
  - family-names: "Todorov"
    given-names: "Konstantin"
    orcid: "https://orcid.org/0000-0002-9116-6692"
  - family-names: "Cecconi"
    given-names: "Cécile"
  - family-names: "Leresche"
    given-names: "Françoise"
  - family-names: "Canno"
    given-names: "Isabelle"
  - family-names: "Puyrenier"
    given-names: "Frédéric"
  - family-names: "Voisin"
    given-names: "Martine"
  - family-names: "Le Meur"
    given-names: "Thierry"
  - family-names: "Troncy"
    given-names: "Raphäel"
    orcid: "https://orcid.org/0000−0003−0457−1436"
  doi: "10.5281/zenodo.1492441"
  title: "Controlled Vocabularies for Music Metadata"
  collection-title: "19th International Society for Music Information Retrieval Conference (ISMIR)"
  city: Paris
  country: FR
  start: 424 # First page number
  end: 430 # Last page number
  month: 9
  year: 2018

GitHub Events

Total
Last Year

Dependencies

build.gradle maven
  • xml-apis:xml-apis 1.4.01 implementation
  • junit:junit 4.11 testImplementation