Recent Releases of fuji

fuji - v3.5.0

Since the first version of F-UJI and the FAIRsFAIR FAIR (FsF) metrics have been published ,a number of papers have been published which intensively analysed both, the tool’ behaviour as well as the FsF metrics In addition, we received numerous valuable feedback and suggestions with respect to the metrics and their implementation either via F-UJI’s Github page or personal communication. As a consequence, we have significantly revised the FAIRsFAIR metrics (version 0.8) to achieve better comparability with other metrics. In addition, we have tried to focus the metrics as much as possible on the essential spirit of the FAIR principles which includes:

  • Focus on FAIR instead of data and metadata quality or veracity
  • Emphasis on the dataset level, not the repository level
  • Completeness and balance in the coverage of the FAIR principles

The revised metrics will affect the behavior of the associated tests and their execution in comparison to previous versions, which include:

  • F1: Persistent identifiers no longer are tested for resolution (e.g. HTTP status 20x), instead redirection (e.g. 30x) is tested to distinguish identifier-like strings from managed identifiers.
  • F2: Methods to expose metadata no longer is tested here
  • F3: Content related metadata is no longer inspected here instead this is tested in R1.
  • F4: Verification whether a resource is registered in search engines has been suspended as no comprehensive and reliable APIs (e.g. Google, Bing) are available.
  • A1: While the specification access conditions is not part of the published FAIR principles, it is required by the RDA metrics. Therefore this metric remains but veracity and use of dedicated data access/rights vocabularies is no longer tested
  • A1: Retrievability of data and metadata is explicitly tested.
  • A1.1: A dedicated metric and tests data and metadata protocol standards have been specified.
  • A1.2: A dedicated metric and tests data and metadata protocol authentication have been specified.
  • R1: Data-related metadata information, e.g. whether file sizes have been specified correctly, is no longer validated.
  • R1.1 Validation of given licenses(e.g. using SPDX registry) is suspended.
  • R1.3 Information which has been retrieved from re3data is no longer used in metric tests since this information may not be valid for every single dataset of a given repository.

The new F-UJI version published here implements these metrics and tests and uses metric 0.8 as the default metric. However, backward compatibility for metric 0.5 and the previously presented discipline-specific metrics is guaranteed.

Another change in the testing behaviour which should be mentioned: To allow better comparability with other tools, F-UJI now treats OpenGraph as RDF encoding, although only a subset of RDFa is used. This probably will improve the score of some datasets.

More information about the new FAIRsFAIR metric version 0.8 can be found here: https://doi.org/10.5281/zenodo.15045911

- Python
Published by huberrob 11 months ago

fuji - v3.4.0

Main changes are support for DQV as additional output format for both, metrics as well as FAIR assessment results, a more verbose standard JSON output which now includes information about the origin of harvested metadata (format, source, method) and some newly implemented methods to verify the presence of temporal and spatial coverage metadata which supports the assessment of community specific metrics from the earth and environmental sciences. In detail the following changes may affect future F-UJI test results:

  • Support for metric and FAIR assessment result standardised output as DQV, F-UJI API now support output as RDF (ttl, jsonld etc) which return DQV RDF. Default output still is the F-UJI custom JSON.
  • Added some ontology and metadata standards namespaces such as FHIR and geoDCAT etc.
  • Improved DDI mapping and parsing (e.g. file type and size detection improved for distributions = may improve FsF-R1-01MD)
  • Improved ISO GCMG mapping and parsing of file size and type (may improve FsF-R1-01MD)
  • Data objects which are offered via services (streaming) now supported for DCA and schema.org and ISO 19xxx see: #513 this included verification sub test (FsF-R1-01MD-2-c) which checks if service endpoint is given and protocol information are specified in metadata
  • Added a browser-like user agent to mimic browsers in case web scraping detection methods hinder access (HTTP 405)
  • Replaced JMESPath based simple JSON-LD parsing moved to RDF parsing
  • Improved schema.org handling e.g. license mapping/parsing now supports CreativeWork licenses in schema.org, may improve FsF-R1.1-01
  • Swagger output format JSON is changed so it now also includes the harvested metadata as well as metadata sources and formats (similar to the harvest method)
  • Improved RDF handling for complicated graphs, now F-UJI tries to detect the main entity instead of picking Dataset classes from a graph which actually describes something else.
  • Added a warning in case the resource type is not indicated or differs from ‘Dataset’ so users may decide if F-UJI is appropriate for the test.
  • Improved schema.org and RO-Crate handling: FsF-R1-01MD and FsF-F3-01M now also consider MediaObjects which are indicated as hasPart of a Dataset
  • New metadata properties are parsed to support community specific tests (geo, env) : spatial coverage, temporal coverage in DCAT, schema.org, DC, DDI EML ISO etc..
  • New tests implemented for env/earth science metrics which verify the presence of spatial or temporal coverage info
  • New YAML prototypic file of a first potential env/earth community metric
  • Some pseudo namespaces which are included in some lov collections are excluded from lov list since they are identifiers: "orcid.org", "doi.org", "ror.org", "zenodo.org", "isni.org", "github.com", "arxiv.org" which may result in lower scores in FsF-I2-01M
  • Due to a parsing bug, sometimes empty property values or null or None values have been stringified to “None” or “null” and scored as valid values. This is no longer regarded as valid value thus, some scores might be lower in 3.4.0.

- Python
Published by huberrob over 1 year ago

fuji - v3.2.0

Changes from 3.1.0 to 3.2.0

  • Integration of FAIR testing for software, for more details see the following pull request:
    • https://github.com/pangaea-data-publisher/fuji/pull/478
  • Improved DCAT handling, now avoids overwriting existing license and access rights info; fixed incorrect handling of distribution info (bytesize type)
  • Re3data metadata lookup is now always performed, before it was done in case no service endpoint was given only.
  • Improved RDFa handling: image tags like The Dubai marina skyline are ignore now
  • Upgraded connexion to v 3; python 3.11
  • Improved XML handling / scheme recognition e.g. for DDI formats
  • Improved handling of non HTML “landing content” for DOIs see: https://github.com/pangaea-data-publisher/fuji/issues/492
  • Improved handling of CC licenses, previously these were not always correctly recognized as valid

- Python
Published by huberrob almost 2 years ago

fuji - v3.1.0

The main change in this release is the data_harvester behavior which is now using threads to download data objects/files. This allows to include more data files for the assessment. In detail, F-UJI now is trying to analyse up to 5 files per mime-type (as listed in the metadata). Some other changes to note:

All: Incorrect handling of some landing pages which cause the evaluator to stop has been fixed. R1.1: Licenses packed as lists are now unpacked and correctly identified I3: In some cases scores for I3 are improved due to the inclusion of schema.org/citation as scanned relation property R1: Incorrect handling of file sizes given or interpreted as strings like 'None', which were accepted as valid content, caused incorrect (too high) scoring of R1, score might be lower but correct now in theses cases. R1: Improved handling of mime types including e.g. charset info (text/plain; charset=US-ASCII) may result in higher score for R1 (FsF-R1-01MD-3) R1: Improved parsing of content length byte units may improve the scoring. F2: Improved handling of RDF graphs containing DC or schema.org terms to describe the content may improve findability and other scores R1.3: F-UJI now uses threads to download more data objects (up to five files/links per claimed content type) which improves its capability to evaluate data content

- Python
Published by huberrob over 2 years ago

fuji - v3.0.0

This new release allows configuration of metric YAML which also affects how tests are performed. More documentation about this will be published soon in the README.

Some changes of F-UJI's behaviour have to be mentioned:

The role of the YAML metric definition file is more important now. It also allows defining individual scores and maturity levels which are now longer hardcoded. Metrics and tests which are not listed in the YAML files are not performed/assessed; this allows to switch on/off metrics and tests for community specific metrics to be defined in dedicated yaml files. F-UJI is now able to use different metrics the REST has now an additional parameter ‘metricversion’ by which the yaml file can be defined (default metricsv0.5.yaml) F-UJI > 2.3.0 has more tests implemented which allow to define metrics and tests in specific yaml files which are more compatible with RDA and The Evaluator:

  • FsF-F1-01DD unique identifier of data
  • FsF-F1-02DD persistent identifier of data
  • FsF-F1-01M which will replace FsF-F1-01D unique identifier of metadata
  • FsF-F1-02M which will replace FsF-F1-02D persistent identifier of metadata
  • FsF-F3-02M (metadata include identifier of dataset)
  • FsF-F4-01M-2 which tests if OAI-PMH, SPARQl or CSW is used to offer metadata

F-UJI now is not using the first data object for F3, A1, R1 and R1.3 but the first data object which is accessible (HTTP 200) Fixed a bug which caused wrong scores for R1 because FsF-R1-01MD-3 was sometimes ignoring matching file sizes and types. F-UJI now also recognizes resource types for R1 if given as URI e.g. schema.org/Dataset Fixed a bug due to which in 2.2.5 signposting links to JSON-LD files was incorrectly accepted as valid search engine support mechanism. Fixed a bug which accepted stringified ‘None’ as entry for file type and size and cause wrong scores for R1 Improved license recognition Improved JSON-LD handling F-UJI is truncating very large data files prior to testing which caused R1 test FsF-R1-01MD-3 (Data content matches file type and size specified in metadata) to incorrectly compare expected file size with truncated size. Now F-UJI compares expected size with size given in HTTP header (if given) to perform this test for truncated files. Prior to version 2.3.0 F-UJI was correctly detecting valid domain agnostic metadata standards in R1.3 (FsF-R1.3-01M-3) but did not assign any score for this. This bug was fixed for F-UJI >=2.3.0 Prior to version 3.0.0 F-UJI was accepting content negotiation in addition to html embedding and microdata as a search engine friendly way to offer metadata in FsF-F4-01M - (Metadata is offered in such a way that it can be retrieved programmatically.) Additionally F-UJI did not verify the metadata standard and content offered via RDFa/microdata. Now, F-UJi is exclusively expecting schema.org, DC or DCAT as search engine friendly metadata formats offered via html embedding and microdata/RDFa. It no longer considers empty RDFa content as it did before.

- Python
Published by huberrob over 2 years ago

fuji - v2.2.5

This release is to allow to reproduce the behavior of the F-UJI release used on f-uji.net since february 2023.

- Python
Published by huberrob over 2 years ago

fuji - v2.0.2

Full Changelog: https://github.com/pangaea-data-publisher/fuji/compare/v1.4.9...v.2.0.2

This release is the first which is based on the completely restructured metadataharvesting class. All metadata and PID collecting methods have moved there from faircheck. This allows easier testing and also using the harvester for other purposes.

- Python
Published by huberrob over 3 years ago

fuji - v.1.4.9

Includes 1.7.9b

This will be the last version which uses metric 0.4

Improvements:

  • Improved signposting handling: better recognition in HTML as well as header; now focusses on metadata and identifier related links and ignores e.g. ORCID author links.
  • Improved JSON-LD handling, now tries to identify dataset (preferred) or creative work metadata in case several JSON-LD snippets are given (e.g. one for Webpage and another one for Dataset)
  • More mime types now recognized
  • Content negotiation now adds a preferred type, e.g. the one found in typed links
  • Namespace recognition now case insensitive
  • Improved Dublin Core parsing, now case insensitive
  • Improved XML mime type recognition

- Python
Published by huberrob almost 4 years ago

fuji - v1.4.7

includes 1.4.7b

- Python
Published by huberrob almost 4 years ago

fuji - v1.4.6

- Python
Published by huberrob about 4 years ago

fuji - v.1.4.3

- Python
Published by huberrob about 4 years ago

fuji - v.1.3.5

- Python
Published by huberrob over 4 years ago

fuji - v.1.0.6

- Python
Published by huberrob almost 5 years ago

fuji - v.1.0.1

Several improvements and bug fixes. Major changes: - Signposting support - Output of individual test results for each metric - Code is now reorganized in several classes per metric to allow easier maintenance

- Python
Published by huberrob about 5 years ago