https://github.com/acdh-oeaw/urinormalizer
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Keywords
Repository
Basic Info
Statistics
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 0
- Releases: 10
Topics
Metadata Files
README.md
URI Normalizer
A class for normalizing named entity URIs from services like Geonames, GND, VIAF, ORCID, etc. and retrieving RDF metadata from them.
By default the rules from the arche-assets library are used by you can supply your own ones.
Any PSR-16 compatible cache can be used to speed up normalization/retrieval of reccuring URIs. A combined in-memory and persistent sqlite-based cache implementation is provided as well.
Context
While looking at the named entity database services it's quite often difficult to tell which URL is a canonical URI for a given named entity.
Just let's take a quick look at a bunch (there are definitely more) of Geonames URLs describing exactly same Geonames named entity with id 2761369:
- http://geonames.org/2761369
- https://geonames.org/2761369
- http://www.geonames.org/2761369
- https://www.geonames.org/2761369
- http://geonames.org/2761369/vienna
- https://geonames.org/2761369/vienna
- http://www.geonames.org/2761369/vienna
- https://www.geonames.org/2761369/vienna
- https://www.geonames.org/2761369/vienna/about.rdf
- https://www.geonames.org/2761369/vienna.html
Which one of them is the right one? The actual answer is quite simple - the one used as an RDF triples subject in the RDF metadata returned by a given service. So the first aim of this package is to provide a tool for transforming any URL coming from a given service and transform it into the canonical URI used by the service in the RDF metadata it returns.
But here we come to another issue - how to fetch the RDF metadata for a given named entity knowing its URI?
For some services (like ORCID or VIAF) it can be done just with an HTTP content negotation by requesting response in one of supported RDF formats. For other though you need to know a service-specific content negotation method, e.g. in Geonames you need to append /about.rdf to the canonical URI.
The second aim of this package is to allow you to retrieve RDF metadata from named entity URIs/URLs without being bothered by all those service-specific peculiarities.
And as such a retrieval involves quite some time, a caching option is also provided.
Automatically generated documentation
https://acdh-oeaw.github.io/arche-docs/devdocs/classes/acdhOeaw-UriNormalizer.html
Installation
composer require acdh-oeaw/uri-normalizer
Usage
```php
Initialization
$normalizer = new \acdhOeaw\UriNormalizer();
string URL normalization
// returns 'https://sws.geonames.org/2761369/' echo $normalizer->normalize('http://geonames.org/2761369/vienna.html');
EasyRdf resource property normalization
$property = 'https://some.id/property'; $graph = new EasyRdf\Graph(); $resource = $graph->resource('.'); $resource->addResource($property, 'http://aaa.geonames.org/276136/borj-ej-jaaiyat.html'); $normalizer->normalizeMeta($resource, $property); // returns 'https://sws.geonames.org/276136/' echo (string) $resource->getResource($property);
Retrieve parsed/raw RDF metadata from URI/URL
// print parsed RDF metadata retrieved from the geonames $metadata = $normalizer->fetch('http://geonames.org/2761369/vienna.html'); echo $metadata->dump('text') . "\n";
// get a PSR-7 request fetching the RDF metadata for a given geonames URL $request = $normalizer->resolve('http://geonames.org/2761369/vienna.html'); echo $request->getUri() . "\n";
Use your own normalization rules
and supply a custom Guzzle HTTP client (can be any PSR-18 one) supplying authentication
$rules = [ [ "match" => "^https://(?:my.)own.namespace/([0-9]+)(?:/.*)?$", "replace" => "https://own.namespace/\1", "resolve" => "https://own.namespace/\1", "format" => "application/n-triples", ], ]; $client = new \GuzzleHttp\Client(['auth' => ['login', 'password']]); $cache = false; $normalizer = new \acdhOeaw\UriNormalizer($rules, '', $client, $cache); // returns 'https://own.namespace/123' echo $normalizer->normalize('https://my.own.namespace/123/foo'); // obviously won't work but if the https://own.namespace would exist, // it would be queried with the HTTP BASIC auth as set up above $normalizer->fetch('https://my.own.namespace/123/foo');
Use cache
$cache = new \acdhOeaw\UriNormalizerCache('db.sqlite'); $normalizer = new \acdhOeaw\UriNormalizer(cache: $cache); // first retrieval should take 0.1-1 second depending on your connection speed $t = microtime(true); $metadata = $normalizer->fetch('http://geonames.org/2761369/vienna.html'); $t = (microtime(true) - $t); echo $metadata->dump('text') . "\ntime: $t s\n"; // second retrieval should be very quick thanks to in-memory cache $t = microtime(true); $metadata = $normalizer->fetch('http://geonames.org/2761369/vienna.html'); $t = (microtime(true) - $t); echo $metadata->dump('text') . "\ntime: $t s\n"; // a completely separate UriNormalizer instance still benefits from the persistent // sqlite cache $cache2 = new \acdhOeaw\UriNormalizerCache('db.sqlite'); $normalizer2 = new \acdhOeaw\UriNormalizer(cache: $cache); $t = microtime(true); $metadata = $normalizer2->fetch('http://geonames.org/2761369/vienna.html'); $t = (microtime(true) - $t); echo $metadata->dump('text') . "\ntime: $t s\n";
As a global singleton
// initialization is done with init() instead of a constructor // the init() takes same parameters as the constructor \acdhOeaw\UriNormalizer::init(); // all other methods (gNormalize(), gFetch() and gResolve()) also work in // the same way and take same parameters as their non-static counterparts // returns 'https://sws.geonames.org/2761369/' echo \acdhOeaw\UriNormalizer::gNormalize('http://geonames.org/2761369/vienna.html'); // fetch and cache parsed RDF metadata echo \acdhOeaw\UriNormalizer::gFetch('http://geonames.org/2761369/vienna.html')->dump('text'); // fetch and cache raw RDF metadata echo \acdhOeaw\UriNormalizer::gResolve('http://geonames.org/2761369/vienna.html')->getBody(); // normalize EasyRdf Resource property $property = 'https://some.id/property'; $graph = new EasyRdf\Graph(); $resource = $graph->resource('.'); $resource->addResource($property, 'http://aaa.geonames.org/276136/borj-ej-jaaiyat.html'); \acdhOeaw\UriNormalizer::gNormalizeMeta($resource, $property); // returns 'https://sws.geonames.org/276136/' echo (string) $resource->getResource($property);
```
Owner
- Name: Austrian Centre for Digital Humanities & Cultural Heritage
- Login: acdh-oeaw
- Kind: organization
- Email: acdh@oeaw.ac.at
- Location: Vienna, Austria
- Website: https://www.oeaw.ac.at/acdh
- Repositories: 476
- Profile: https://github.com/acdh-oeaw
GitHub Events
Total
- Create event: 3
- Release event: 2
- Issues event: 1
- Delete event: 2
- Issue comment event: 1
- Push event: 1
Last Year
- Create event: 3
- Release event: 2
- Issues event: 1
- Delete event: 2
- Issue comment event: 1
- Push event: 1
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Mateusz Żółtak | z****k@z****g | 46 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 3
- Total pull requests: 0
- Average time to close issues: 15 days
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 0
- Average time to close issues: 15 days
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- zozlak (3)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- packagist 6,097 total
- Total dependent packages: 5
- Total dependent repositories: 3
- Total versions: 10
- Total maintainers: 1
packagist.org: acdh-oeaw/uri-normalizer
A simple class for normalizing external entity reference sources' URIs (Geonames, GND, etc. URIs).
- Homepage: https://github.com/acdh-oeaw/uriNormalizer
- License: MIT
-
Latest release: 3.2.0
published 9 months ago
Rankings
Maintainers (1)
Funding
Dependencies
- phpunit/phpunit ^9 development
- acdh-oeaw/arche-assets *
- acdh-oeaw/easyrdf *
- php >= 7.1
- actions/checkout v2 composite