troveproxy
A transforming proxy and harvester for the National Library of Australia's Trove API
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary
Keywords
Repository
A transforming proxy and harvester for the National Library of Australia's Trove API
Basic Info
Statistics
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 14
- Releases: 0
Topics
Metadata Files
README.md
TroveProxy
A transforming proxy for the National Library of Australia's Trove API
The proxy consists of a docker container containing: - The web server Apache Tomcat - The web servlet XProc-Z - An XProc pipeline implementing the proxy - XSLT transformations for converting the output of the Trove API into various other formats
API Usage
Access the proxied Trove API as you would normally access the Trove API, except replacing
https://api.trove.nla.gov.au/ with http://localhost:8080/proxy/ as the base URI of the API service.
The proxied API accepts some additional parameters:
|Parameter name|Values|
|------------------------|----------|
|proxy-include-people-australia|If set to true then additional information about people will be added from People Australia|
|proxy-format|Set to tei to return TEI XML, atom to return Atom Syndication XML, csv to return Comma Separated Values, or leave blank for Trove-style XML|
|proxy-metadata-format|Set to ro-crate to return an RO-Crate description of the data instead of the actual data|
|proxy-metadata-license-uri|Set to the URL of a licence such as https://creativecommons.org/licenses/by-nc-sa/3.0/au/|
|proxy-metadata-name|Provide a title for your dataset, e.g. Scone recipes|
|proxy-metadata-description|Provide a description for your dataset, e.g. A collection of scone recipes|
Developing
To build the docker application, naming the image trove-proxy:
bash
docker build -t trove-proxy .
To launch the trove-proxy image:
bash
docker run --publish 8080:8080 trove-proxy
For convenience while developing the proxy application, you can use the --mount command to
mount the src folder into the container, so that you can edit the XProc and XSLT code and have the changes reflected in the running container
immediately, without having to rebuild the docker image. If accessing the application using a web browser, you can just refresh the browser to
see the effect of any changes to the code.
bash
docker run --publish 8080:8080 --mount type=bind,src=`pwd`/src,dst=/src trove-proxy
Running
With the container running, the proxied API can be accessed as if it were the Trove API, but substituting
http://localhost:8080/proxy/ in place of https://api.trove.nla.gov.au/.
For example: http://localhost:8080/proxy/v3/result?category=newspaper&category=book&q=water+dragon&include=all&proxy-format=tei&n=10&reclevel=full&bulkHarvest=true
In addition to Trove's own query syntax, the URI should typically contain the parameter proxy-format,
whose value determines the output format of the proxy; values should be either tei, atom, or if the
proxy-format parameter is missing from the URI, or if it's left blank, the proxy will return Trove's XML
without transformation, beyond the rectification of a few errors and infelicities.
XSLT
XSLT transformations should be placed in the src/xslt/crosswalks folder, and named for the format which they produce, e.g. tei.xsl to output tei. To install a new output
format, it's enough to add the appropriately named XSLT file into that folder.
Before the XML retrieved from the Trove API is supplied to the crosswalk stylesheets, it will have been slightly altered by the fix-trove-response.xsl pre-processing stylesheet:
- The partially-escaped markup found in the articleText element of a newspaper article will be replaced
with embedded <p> and <span> elements (i.e. the articleText element will be "mixed content").
- Some errors which have been detected in the new (v3) API will be worked around.
If any new errors are detected in the Trove API, work-arounds should be inserted in that stylesheet, rather than in specific
crosswalk stylesheets, so that all the crosswalks can benefit from the workaround.
A second pre-processor rewrite-trove-uris-as-proxy-uris.xsl will also replace URIs within the Trove XML which
are identifiers for individual works with proxied URIs which should resolve to proxied versions of those works.
Crosswalk stylesheets should not need to change these URIs and should be able to simply copy them into the
appropriate place for their output format.
The crosswalk stylesheets will be passed a request-uri parameter, which will be the URI of the proxied resource
(i.e. it's usable as a self reference).
Owner
- Name: Conal Tuohy
- Login: Conal-Tuohy
- Kind: user
- Location: Brisbane, Queensland, Australia
- Website: http://conaltuohy.com/
- Repositories: 36
- Profile: https://github.com/Conal-Tuohy
I am a freelance software developer specialising in text and metadata processing, including various XML technologies, RDF, Linked Data, and SPARQL.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: TroveProxy
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Conal
family-names: Tuohy
email: conal.tuohy@gmail.com
orcid: 'https://orcid.org/0000-0001-6831-3423'
- given-names: Mark
family-names: Raadgever
email: mark@flyinggoatcreations.com
orcid: 'https://orcid.org/0009-0006-4202-3624'
repository-code: 'https://github.com/Conal-Tuohy/TroveProxy'
url: 'https://github.com/Conal-Tuohy/TroveProxy'
abstract: >-
A transforming proxy for the National Library of
Australia's Trove API. The software exposes Trove's web
API while adding certain extra features; in particular it
exposes the data in additional data formats, facilitates
bulk harvesting, and provides a facility to generate
RO-Crate metadata descriptions of harvested datasets.
keywords:
- Trove
- Proxy
- Text Encoding for Interchange
- Comma Separated Values
- Atom Syndication Format
license: Apache-2.0
GitHub Events
Total
Last Year
Dependencies
- tomcat 9.0.76-jdk21-openjdk-slim build
- actions/checkout v3 composite
- docker/build-push-action 0565240e2d4ab88bba5387d719585280857ece09 composite
- docker/login-action 343f7c4344506bcbf9b4de18042ae17996df046d composite
- docker/metadata-action 96383f45573cb7f253c731d3b3ab81c87ef81934 composite
- docker/setup-buildx-action f95db51fddba0c2d1ec667646a06c2ce06100226 composite
- sigstore/cosign-installer 6e04d228eb30da1757ee4e1dd75a0ec73a653e06 composite