exporter-transformer
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: gdcc
- License: apache-2.0
- Language: XSLT
- Default Branch: main
- Size: 224 KB
Statistics
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 2
- Releases: 7
Metadata Files
README.md
Transformer Exporter for Dataverse
This exporter allows you to have up to 100 exporters using a single pre-built JAR file. You can add new exporters by adding directories into the exporters directory (see the Installation section below) and placing (and editing) the configuration (config.json) and the transformation (transformer.json, transformer.py or transformer.xsl, see also the examples below) files in it.
Supported Dataverse versions: 6.0 - recent.
Installation
If you haven’t already configured it, set the dataverse-spi-exporters-directory configuration value first. Then navigate to the configured directory and download the JAR file together with the examples you want to try out:
```shell
download the jar
wget -O exporter-transformer-1.0.10-jar-with-dependencies.jar https://repo1.maven.org/maven2/io/gdcc/export/exporter-transformer/1.0.10/exporter-transformer-1.0.10-jar-with-dependencies.jar
download the hello-world example
mkdir hello-world wget -O hello-world/config.json https://raw.githubusercontent.com/gdcc/exporter-transformer/main/examples/hello-world/config.json wget -O hello-world/transformer.json https://raw.githubusercontent.com/gdcc/exporter-transformer/main/examples/hello-world/transformer.json
download the debug example
mkdir debug wget -O debug/config.json https://raw.githubusercontent.com/gdcc/exporter-transformer/main/examples/debug/config.json wget -O debug/transformer.json https://raw.githubusercontent.com/gdcc/exporter-transformer/main/examples/debug/transformer.json
etc.
```
After restarting the Dataverse, you should be able to use the newly installed exporters (next to the internal exporters):
Each exporter will have at least these files after starting:
All of these files can be edited, if needed. Typically you will only need to edit the config.json and the transformer.json files. If you want to add more exporters, your own or from the provided examples, just add a new configuration directory in your exporters directory with at least the config.json and the transformer.json, transformer.py or transformer.xsl files there. After restarting the servers the newly added exporters should be ready to use.
Examples
The following examples are provided in the examples directory:
Hello World!
Very basic exporter providing always the same output: {"hello":"World!"}.
Debug
This exporter uses only the identity transformation on the provided source document. It lets you to see what fields are available for copying and transforming:
- datasetJson: native Dataverse JSON export
- datasetORE: ORE Dataverse export
- datasetSchemaDotOrg: Schema.org JSON-LD export
- datasetFileDetails: file details from the native Dataverse JSON export
- preTransformed: JSON-pointer friendly version of the native Dataverse JSON export
- config: the content of the config.json
Short example
This exporter copies only the title, the author names and the file download URL to the output.
Javascript transformer
The same exporter as the "Short example", but it uses JavaScript instead of copy transformations.
Croissant
This exporter is entirely based on the Croissant Exporter for Dataverse. It is simply a port of that exporter into JavaScript that is bundled into a ready to use transformer. It is also a great example to start from when writing your own exporters.
Basic RO-Crate
This exporter transforms the output from the Schema.org exporter into an RO-Crate compatible output.
Transformer generated with Python
This exporter is based on the Customizable RO-Crate Metadata Exporter for Dataverse. You can edit the provided CSV file and rerun the Python script to overwrite the default transformer.json:
shell
python3 csv2transformer.py
After copying the resulting transformer.json, together with the provided config.jar, you will have a customized RO-Crate exporter (listed as "CSV RO-Crate" by default).
Debug XML
This exporter outputs the XML version of the native JSON format that can be transformed with XSLT.
Short example XML
This exporter copies only the title, the author names and the filenames of the dataset version, and outputs them in an XML document.
Debug written in Python
This exporter is identical to the Debug example in its output (the only difference is that it is written in py) and it lets you to see what fields are available for copying and transforming:
- datasetJson: native Dataverse JSON export
- datasetORE: ORE Dataverse export
- datasetSchemaDotOrg: Schema.org JSON-LD export
- datasetFileDetails: file details from the native Dataverse JSON export
- preTransformed: JSON-pointer friendly version of the native Dataverse JSON export
- config: the content of the config.json
Short example written in Python
This exporter copies only the title, the author names and the filenames of the dataset version, and outputs them in a JSON document. It is written in Python.
HTML example written in Python
This exporter takes JSON input from a prerequisite exporter (short_example_py by default), and displays it as HTML. It is written in Python. In order to change the JSON input for this exporter, change the prerequisiteFormatName value in the config.json to the format name of the exporter you wish to use as input.
DDI PDF codebook
This exporter is entirely based on the DDI PDF Exporter. It is simply a port of that exporter into Python (Jython). It illustrates how to convert XML input to PDF in an exporter.
ARP RO-Crate
This exporter is entirely based on the Dataverse PR 10086. It is simply a port of that exporter into Python (Jython).
Developer guide
The easiest way to start is to write JavasCript code. You can use the provided Croissant code as the start point. You will need to restart the server after changing that code. Note that the exporters use caching, you will need to either to wait until the cache is expired or delete the cached exporter output manually to see the changes.
The JavaScript supported by the transformer exporter is as provided by the Project Nashorn, you can only use the syntax provided by that project. Additional limitation is that the multiple line statements are not supported. This could be circumvented by using a minimizer, or simply by using only single line statements (empty lines, comments, etc. are fine to include in the JavaScript files). Finally, you can access these Java classes from your scripts:
- Map: java.util.LinkedHashMap
- Set: java.util.LinkedHashSet
- List: java.util.ArrayList
- Collectors: java.util.stream.Collectors
- JsonValue: jakarta.json.JsonValue
You can also try writing the transformations using the transformation language as described here. It is a preferred way for writing straight-forward exporters, for example, when you only need to add one or more fields to an already existing exporter format. In that case, you could use the identity transformation followed by simple copy transformations. You can also start from an already existing example and add new copy, remove, etc., transformations at the end of the transformer.json file.
You can also write XML transformations in a similar way, but using the XSLT instead of JSON-transformations, as illustrated in the provided XML examples.
Finally, you can also write your transformers as Python code. You can start from the provided example that can also be run as test:
shell
mvn test -Dtest="TransformerExporterTest#testPythonScript"
You can start by changing the code in the transformer.py, shown below, and testing your code until the desired outcome is achieved (see also py-input.json and py-result.json). When you are done, just place the new transformer.py together with a config.json files in a new folder in the exporters directory (make sure that the transformer-exporter JAR file is also placed in the exporters directory). After restarting the server, your new exporter should be ready to use.
```py res["title"] = x["preTransformed"]["datasetVersion"]["metadataBlocks"]["citation"]["title"]
res["author"] = [] for author in x["preTransformed"]["datasetVersion"]["metadataBlocks"]["citation"]["author"]: res["author"].append(author["authorName"])
res["files"] = [] for distribution in x["datasetSchemaDotOrg"]["distribution"]: res["files"].append(distribution["contentUrl"]) ```
Note that you can also use Java classes from your Python code, as explained on the Jython website (the library used by this exporter for the Python language interpretation), e.g.:
```py from java.lang import System # Java import
print('Running on Java version: ' + System.getProperty('java.version')) print('Unix time from Java: ' + str(System.currentTimeMillis())) ```
See also the documentation from Jython and the DDI-PDF example for how it is used in practice.
Configuration
The configuration file (config.json) for the exporter can contain the following fields:
- formatName (default: transformer_json): The name of the format it creates. If this format is already provided by a built-in exporter, this Exporter will override the built-in one. (Note that exports are cached, so existing metadata export files are not updated immediately.)
- displayName (default: Transformer example): The display name shown in the UI.
- harvestable (default: false): Whether the exported format should be available as an option for Harvesting.
- availableToUsers (default: true): Whether the exported format should be available for download in the UI and API.
- mediaType (default: transformer_json): Defines the mime type of the exported format - used when metadata is downloaded, i.e. to trigger an appropriate viewer in the user's browser.
- prerequisiteFormatName (default: null): Defines the name of the export format that will be used as input for this exporter (if left null or omitted, the default input will be used).
- includeDefaultInputWithPrerequisiteInput (default: false): Whether the default input should be included when prerequisite input is requested. When set to yes, the default input will be added in defaultInputFromDataProvider field inside the prerequisite input JSON that is specified as input for this transformer.
Owner
- Name: Global Dataverse Community Consortium
- Login: gdcc
- Kind: organization
- Email: Jonathan_Crabtree@unc.edu
- Location: Worldwide
- Website: http://dataversecommunity.global
- Repositories: 14
- Profile: https://github.com/gdcc
GDCC uses Github to coordinate community contributions to Dataverse and to manage develop of software and documentation that extend or interact with Dataverse.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Kulikowski" given-names: "Eryk" orcid: "https://orcid.org/0000-0002-9967-9732" title: "Transformer Exporter for Dataverse" version: 1.0.10 date-released: 2024-08-29 url: "https://github.com/gdcc/exporter-transformer"
GitHub Events
Total
- Watch event: 1
- Push event: 1
- Pull request event: 2
- Create event: 1
Last Year
- Watch event: 1
- Push event: 1
- Pull request event: 2
- Create event: 1
Dependencies
- jakarta.json:jakarta.json-api 2.1.3 provided
- jakarta.ws.rs:jakarta.ws.rs-api 4.0.0 provided
- com.google.auto.service:auto-service 1.1.1
- io.gdcc:dataverse-spi 2.0.0
- io.github.erykkul:json-transformer 1.0.4
- org.eclipse.parsson:parsson 1.1.6
- org.openjdk.nashorn:nashorn-core 15.4
- org.python:jython-standalone 2.7.3
- org.junit.jupiter:junit-jupiter 5.10.3 test
- actions/checkout v4 composite
- actions/setup-java v4 composite
- actions/checkout v4 composite
- actions/setup-java v4 composite
- actions/checkout v4 composite
- actions/setup-java v4 composite
- actions/checkout v4 composite
- actions/setup-java v4 composite