shexml

A heterogeneous data mapping language based on Shape Expressions

https://github.com/herminiogg/shexml

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A heterogeneous data mapping language based on Shape Expressions

Basic Info
Statistics
  • Stars: 17
  • Watchers: 3
  • Forks: 3
  • Open Issues: 7
  • Releases: 21
Created about 8 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation Codemeta

README.md

ShExML

Master build Maven Central DOI SWH

Shape Expressions Mapping Language (ShExML) is a DSL that offers a solution for mapping and merging heterogeneous data sources. As being based on ShEx the shape is the main foundation to define the transformations.

Example

``` PREFIX : http://example.com/ SOURCE filmsxmlfile https://rawgit.com/herminiogg/ShExML/master/src/test/resources/films.xml SOURCE filmsjsonfile https://rawgit.com/herminiogg/ShExML/master/src/test/resources/films.json ITERATOR filmxml { FIELD id <@id> FIELD name FIELD year FIELD country FIELD directors } ITERATOR filmjson { FIELD id FIELD name FIELD year FIELD country FIELD directors } EXPRESSION films

:Films :[films.id] { :name [films.name] ; :year [films.year] ; :country [films.country] ; :director [films.directors] ; } This example shows how to map and merge two files (in JSON and XML) with different films. In the first part, the declarations, we can define some 'variables' that can be used inside the shapes. Prefixes used in the resulting RDF, sources to the files, iterators and fields (queries) to be applied over the files and expressions to merge and transform the queries results. Then, the shapes are defined as in ShEx but using the previously defined expressions or composing them inside the square brackets. More complex example can be seen under thefilms.shexml``` file.

Features

  • XML support (using XPath queries)
  • JSON support (using JSONPath queries)
  • CSV and TSV support
  • Relational databases, with following included drivers
    • MySQL
    • SQLite
    • PostgreSQL
    • MariaDB
    • SQLServer
  • Matchers
  • Joins
  • Named graphs
  • Autoincrement ids

The full specification with all the supported features and examples can be consulted here.

Usage

CLI

A command line interface is offered under the jar library with the following options available: Usage: ShExML [-h] [-id] [-nu] [-V] [-d=<drivers>] [-f=<format>] -m=<file> [-o=<output>] [-p=<password>] [-u=<username>] [-pc | -r | -rp | -s | -sm | -sh | -shc] Map and merge heterogeneous data sources with a Shape Expressions based syntax -m, --mapping=<file> Path to the file with the mappings -h, --help Show this help message and exit. -V, --version Print version information and exit. Options for the transformation to RDF -id, --inferenceDatatypes Use the inference system for choosing the best suited datatype for the generated literal. Without this option, and not declaring a datatype in the mapping rules, all the literals will be outputted as strings -nu, --normaliseURIs Activate the URI normalisation system which allows to avoid malformed URIs when using strings for URI creation -f, --format=<format> Output format for RDF graph. Turtle, RDF/XML, N-Triples, ... Other transformation options -pc, --precompile Create a single version including all the imported files, useful for debugging purposes. Additionally it checks the input for syntactic and grammatical errors -r, --rml Generate RML output -rp, --rmlPrettified Generate RML output using Blank nodes for better readability -s, --shex Generate ShEx validation -sm, --shapeMap Generate Shape Map for ShEx validation -sh, --shacl Generate SHACL validation -shc, --shaclClosed Generate SHACL validation with closed shapes as default General configuration options applying to all the available transformations -o, --output=<output> Path where the output file should be created -u, --username=<username> Username in case of using a database -p, --password=<password> Password in case of using a database -d, --drivers=<drivers> Add more JDBC database drivers in the form of <startJDBCURL>%<driver> and separating them with ";". Example: jdbc:postgresql%org. postgresql.Driver;jdbc:oracle%oracle.jdbc. OracleDriver Therefore, to execute the films example: java -jar shexml.jar -m films.shexml

JVM compatible API

ShExML is coded in Scala and, because of that, it can be used with JVM compatible languages. See the example below on how to use the programmatic API. val file = scala.io.Source.fromFile(pathToFile).mkString val mappingLauncher = new MappingLauncher() val output = mappingLauncher.launchMapping(file, "TURTLE")

Requirements

The minimal versions for this software to work are: - JDK 17, or the Open JDK 17. (Versions matching earlier JDK version can be generated following the Build instructions or provided upon request.) - Scala 2.12.17 - SBT 1.7.2

Webpage

A live playground is also offered online (http://shexml.herminiogarcia.com). However, due to hardware limitations it is not intended for intensive use.

Citation

This tool is part of a scientific project which has led to different publications. The main and preferred publication for citation is: García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J. E., & Lovelle, J. M. C. (2020). ShExML: improving the usability of heterogeneous data mapping languages for first-time users. PeerJ Computer Science, 6, e318. https://doi.org/10.7717/peerj-cs.318

Other possible publications per topic are: * Optimisatin of the ShExML engine García-González, H. (2025). Optimising the ShExML engine through code profiling: From turtle’s pace to state-of-the-art performance. Semantic Web, (Preprint), 1-30. https://doi.org/10.3233/SW-243736 * Translation from ShExML to RML García-González, H., & Dimou, A. (2022, September). Why to tie to a single data mapping language? enabling a transformation from shexml to rml. In Proceedings of Poster and Demo Track and Workshop Track of the 18th International Conference on Semantic Systems co-located with 18th International Conference on Semantic Systems (SEMANTiCS 2022) (Vol. 3235, pp. paper-11). https://ceur-ws.org/Vol-3235/paper11.pdf * Addressing mapping challenges with ShExML García-González, H. (2021, June). A ShExML perspective on mapping challenges: already solved ones, language modifications and future required actions. In 2nd International Workshop on Knowledge Graph Construction co-located with 18th Extended Semantic Web Conference (ESWC 2021), Online, June 6, 2021, CEUR Workshop Proceedings (vol. 2873). http://ceur-ws.org/Vol-2873/paper2.pdf * Inception poster Garcia-Gonzalez, H., Fernandez-Alvarez, D., & Gayo, J. E. L. (2018). ShExML: An Heterogeneous Data Mapping Language based on ShEx. In EKAW (Posters & Demos) (pp. 9-12). https://ceur-ws.org/Vol-2262/ekaw-poster-08.pdf

Build

The library uses sbt as the package manager and building tool, therefore to compile the project you can use the following command: $ sbt compile To run the project from within sbt you can use the command below, where <options> can be replaced by the arguments explained in the CLI $ sbt "run <options>" To generate an executable JAR file you can call the following command. Take into account that if you want to test the library before generating the artifact you need to set up the testing environment as explained in the Testing section. Alternatively, you can use the "set test in assembly := {}" option to omit the tests during the build process. $ sbt "set test in assembly := {}" clean update assembly

Testing

The project contains a full suite of tests that checks that all the features included in the engine work as expected. These tests units are included under the src/test/scala folder. To run them you can use the command below. Notice that it is of utmost importance to test that the project pass the test for all the cross-compiled versions used within the project (see the Cross-compilation section for more details). $ sbt test The test environment uses some external resources that need to be set up before running them. This mainly involves starting a MySQL/MariaDB and a PostreSQL database, creating the relational schema and filling the tables up with the dummy data. This process is enclosed in a Docker contain and can be set up using the following command: $ docker compose up -d

Cross-compilation

The project is enabled to work with three different versions of Scala (i.e., 2.12.x, 2.13.x and 3.x) so it can be used across different Scala environments. Therefore, all the commands will work by default with the 3.x version but it is possible to run the same command for all the versions at the same time or just for one specific version. Below you can see how to do so with the test command.

Testing against all the cross-compiled versions: $ sbt "+ test"

Testing against a specific version where <version> is one of the configured versions in the build.sbt file: $ sbt "++<version> test"

Dependencies

The following dependencies are used by this library:

| Dependency | License | |--------------------------------------------|-----------------------------------------| | org.antlr / antlr4 | BSD-3-Clause | | net.sf.saxon / Saxon-HE | MPL-2.0 | | org.apache.jena / jena-base | Apache License 2.0 | | org.apache.jena / jena-core | Apache License 2.0 | | org.apache.jena / jena-arq | Apache License 2.0 | | org.apache.jena / jena-shacl | Apache License 2.0 | | info.picocli / picocli | Apache License 2.0 | | org.slf4j / slf4j-nop | MIT License | | com.github.tototoshi / scala-csv | Apache License 2.0 | | org.xerial / sqlite-jdbc | Apache License 2.0 | | mysql / mysql-connector-java | GPL-v2 (Universal FOSS Exception v1) | | org.postgresql / postgresql | BSD-2-Clause | | org.mariadb.jdbc / mariadb-java-client | LGPL-2.1 | | com.microsoft.sqlserver / mssql-jdbc | MIT License | | com.github.vickumar1981 / stringdistance | Apache License 2.0 | | com.typesafe.scala-logging / scala-logging | Eclipse Public License v1.0 or LGPL-2.1 | | com.jayway.jsonpath / json-path | Apache License 2.0 | | org.scala-lang / scala-reflect | Apache License 2.0 | | org.scala-lang / scala-compiler | Apache License 2.0 | | ch.qos.logback / logback-classic | Eclipse Public License v1.0 or LGPL-2.1 |

For performing a more exhaustive licenses check, including subdependecies and testing ones the sbt-license-report plugin is included in the project, enabling the generation of a report with the command: $ sbt dumpLicenseReport The results are available, after the execution of this command, under the directory target/license-reports.

Owner

  • Name: Herminio García González
  • Login: herminiogg
  • Kind: user
  • Location: Brussels, Belgium
  • Company: Kazerne Dossin

Citation (CITATION.cff)

cff-version: 1.2.0
title: ShExML
type: software
authors:
  - given-names: Herminio
    family-names: García-González
    affiliation: Kazerne Dossin
    orcid: 'https://orcid.org/0000-0001-5590-4857'
identifiers:
  - type: url
    value: 'https://shexml.herminiogarcia.com/'
    description: Main webpage
  - type: doi
    value: 10.5281/zenodo.11577338
    description: Zenodo's DOI
  - type: swh
    value: 'swh:1:rev:69285f19a1dee8d4d4141a5ff021c176ef691e90'
    description: Version 0.5.3 release
repository-code: 'https://github.com/herminiogg/ShExML'
url: 'https://shexml.herminiogarcia.com/'
abstract: >-
  Shape Expressions Mapping Language (ShExML) is a DSL that
  offers a solution for mapping and merging heterogeneous
  data sources. As being based on ShEx the shape is the main
  foundation to define the transformations.
license: MIT

CodeMeta (codemeta.json)

{
  "@context": {
    "@vocab": "https://w3id.org/codemeta/3.0/",
    "schema": "http://schema.org/",
    "id": "@id",
    "type": "@type",
    "name": {
      "@id": "schema:name"
    },
    "version": {
      "@id": "schema:version"
    },
    "identifier": {
      "@id": "schema:identifier",
      "@type": "@id"
    },
    "email": {
      "@id": "schema:email"
    },
    "affiliation": {
      "@id": "schema:affiliation",
      "@type": "@id"
    },
    "familyName": {
      "@id": "schema:familyName"
    },
    "givenName": {
      "@id": "schema:givenName"
    },
    "roleName": {
      "@id": "schema:roleName"
    },
    "author": {
      "@id": "schema:author",
      "@type": "@id"
    },
    "dateCreated": {
      "@id": "schema:dateCreated",
      "@type": "http://www.w3.org/2001/XMLSchema#date"
    },
    "softwareRequirements": {
      "@id": "schema:softwareRequirements",
      "@type": "@id"
    },
    "developmentStatus": {
      "@id": "schema:developmentStatus"
    },
    "issueTracker": {
      "@id": "https://w3id.org/codemeta/3.0/issueTracker",
      "@type": "@id"
    },
    "referencePublication": {
      "@id": "https://w3id.org/codemeta/3.0/referencePublication",
      "@type": "@id"
    },
    "applicationCategory": {
      "@id": "schema:applicationCategory"
    },
    "dateModified": {
      "@id": "schema:dateModified",
      "@type": "http://www.w3.org/2001/XMLSchema#date"
    },
    "runtimePlatform": {
      "@id": "schema:runtimePlatform"
    },
    "codeRepository": {
      "@id": "schema:codeRepository",
      "@type": "@id"
    },
    "description": {
      "@id": "schema:description"
    },
    "downloadUrl": {
      "@id": "schema:downloadUrl",
      "@type": "@id"
    },
    "license": {
      "@id": "schema:license",
      "@type": "@id"
    },
    "programmingLanguage": {
      "@id": "schema:programmingLanguage"
    },
    "contributor": {
      "@id": "schema:contributor",
      "@type": "@id"
    },
    "releaseNotes": {
      "@id": "schema:releaseNotes"
    },
    "SoftwareSourceCode": {
      "@id": "schema:SoftwareSourceCode"
    },
    "Person": {
      "@id": "schema:Person"
    },
    "Organization": {
      "@id": "schema:Organization"
    },
    "Role": {
      "@id": "schema:Role"
    }
  },
  "id": "https://github.com/herminiogg/ShExML",
  "type": "SoftwareSourceCode",
  "applicationCategory": "Computer Science",
  "author": [
    {
      "type": "Role",
      "roleName": "Main author"
    },
    {
      "id": "https://herminiogarcia.com/#me",
      "type": "Person",
      "affiliation": {
        "id": "https://kazernedossin.eu/en",
        "type": "Organization",
        "name": "Kazerne Dossin"
      },
      "email": "herminio.garciagonzalez@kazernedossin.eu",
      "familyName": "García González",
      "givenName": "Herminio",
      "identifier": "https://orcid.org/0000-0001-5590-4857"
    }
  ],
  "codeRepository": "https://github.com/herminiogg/ShExML",
  "contributor": {
    "id": "https://niod.knaw.nl/en/staff/mikebryant",
    "type": "Person",
    "affiliation": {
      "id": "https://niod.knaw.nl/en",
      "type": "Organization",
      "name": "NIOD Institute for War, Holocaust and Genocide Studies"
    },
    "email": "m.bryant@niod.knaw.nl",
    "familyName": "Bryant",
    "givenName": "Mike",
    "identifier": "https://orcid.org/0000-0003-0765-7390"
  },
  "dateCreated": "2018-02-22",
  "dateModified": "2025-09-10",
  "description": "A heterogeneous data mapping language based on Shape Expressions",
  "developmentStatus": "active",
  "downloadUrl": "https://api.github.com/repos/herminiogg/ShExML/downloads",
  "identifier": "https://doi.org/10.5281/zenodo.17092549",
  "license": "https://api.github.com/licenses/mit",
  "name": "ShExML",
  "programmingLanguage": "Scala",
  "releaseNotes": "## What's Changed\r\n- Added a parellelisation option in the RDF conversion. You can decide which parts of the execution you want to run in parallel and the number of threads to be used (or let the engine decide based on you hardware specs).\r\n- Stdin can be used as input for the mapping rules or as a input source.\r\n- Some minor fixes and stability improvements.\r\n\r\n**Full Changelog**: https://github.com/herminiogg/ShExML/compare/v0.5.4...v0.6.0",
  "runtimePlatform": "JVM",
  "softwareRequirements": [
    "http://example.org/jena-shacl",
    "http://example.org/Saxon-HE",
    "http://example.org/logback-classic",
    "http://example.org/jena-core",
    "http://example.org/scala-compiler",
    "http://example.org/mysql-connector-java",
    "http://example.org/stringdistance_2.13",
    "http://example.org/rmlmapper",
    "http://example.org/jena-base",
    "http://example.org/srdf_3",
    "http://example.org/scala-parallel-collections_3",
    "http://example.org/mssql-jdbc",
    "http://example.org/json-path",
    "http://example.org/picocli",
    "http://example.org/postgresql",
    "http://example.org/scala3-library_3",
    "http://example.org/shex_3",
    "http://example.org/scala-logging_3",
    "http://example.org/jena-arq",
    "http://example.org/srdf4j_3",
    "http://example.org/slf4j-nop",
    "http://example.org/antlr4",
    "http://example.org/mariadb-java-client",
    "http://example.org/sqlite-jdbc",
    "http://example.org/scalatest_3",
    "http://example.org/scala-csv_2.13",
    "http://example.org/scala-reflect"
  ],
  "version": "0.6.0",
  "issueTracker": "https://api.github.com/repos/herminiogg/ShExML/issues",
  "referencePublication": "https://doi.org/10.7717/peerj-cs.318"
}

GitHub Events

Total
  • Issues event: 18
  • Watch event: 2
  • Issue comment event: 6
  • Push event: 22
  • Pull request event: 28
  • Fork event: 1
  • Create event: 10
Last Year
  • Issues event: 18
  • Watch event: 2
  • Issue comment event: 6
  • Push event: 22
  • Pull request event: 28
  • Fork event: 1
  • Create event: 10

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 61
  • Total pull requests: 67
  • Average time to close issues: 4 months
  • Average time to close pull requests: 1 minute
  • Total issue authors: 7
  • Total pull request authors: 1
  • Average comments per issue: 0.39
  • Average comments per pull request: 0.03
  • Merged pull requests: 63
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 12
  • Pull requests: 15
  • Average time to close issues: about 2 months
  • Average time to close pull requests: less than a minute
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.25
  • Average comments per pull request: 0.0
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • herminiogg (68)
  • luigi-asprino (2)
  • s-minoo (2)
  • anaigmo (1)
  • CMCosta (1)
  • SandraAmador (1)
  • labra (1)
  • andrawaag (1)
  • ericprud (1)
Pull Request Authors
  • herminiogg (87)
Top Labels
Issue Labels
enhancement (47) bug (20) question (1)
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads: unknown
  • Total dependent packages: 3
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 32
repo1.maven.org: com.herminiogarcia:shexml_2.12

A heterogeneous data mapping language based on Shape Expressions.

  • Versions: 12
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Stargazers count: 29.2%
Dependent repos count: 32.0%
Dependent packages count: 32.0%
Average: 32.3%
Forks count: 36.0%
Last synced: 7 months ago
repo1.maven.org: com.herminiogarcia:shexml_3

A heterogeneous data mapping language based on Shape Expressions.

  • Versions: 10
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Stargazers count: 29.5%
Dependent repos count: 32.0%
Dependent packages count: 32.0%
Average: 32.4%
Forks count: 36.0%
Last synced: 7 months ago
repo1.maven.org: com.herminiogarcia:shexml_2.13

A heterogeneous data mapping language based on Shape Expressions.

  • Versions: 10
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Stargazers count: 29.5%
Dependent repos count: 32.0%
Dependent packages count: 32.0%
Average: 32.4%
Forks count: 36.0%
Last synced: 7 months ago

Dependencies

.github/workflows/scala.yml actions
  • actions/checkout v2 composite
  • actions/setup-java v2 composite