fuji4software

FAIRsFAIR Research Data Object Assessment Service as extended in the FAIR-IMPACT milestone M5.6

https://github.com/FAIR-IMPACT/fuji4software

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 10 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

FAIRsFAIR Research Data Object Assessment Service as extended in the FAIR-IMPACT milestone M5.6

Basic Info

Host: GitHub
Owner: FAIR-IMPACT
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 4.86 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created about 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme Contributing License Citation Authors

F-UJI (FAIRsFAIR Research Data Object Assessment Service)

This repository contains the extension of F-UJI for FAIR4RS metrics, carried out for milestone M5.6, which comes from Task 5.2 (FAIR metrics for research software) on "Practical tests for automated FAIR software assessment in a disciplinary context" and is part of Work Package 5 on "Metrics, Certification and Guidelines" within the FAIR-IMPACT project. The base repository is also available on GitHub. Any contributions should be made to their repository. Evaluation scripts and results are included in ./evaluation/. The milestone report describing this development will be published shortly under the DOI displayed below.

Orginal README for F-UJI

Developers: Robert Huber, Anusuriya Devaraju

Thanks to Heinz-Alexander Fuetterer for his contributions and his help in cleaning up the code.

Overview

F-UJI is a web service to programmatically assess FAIRness of research data objects based on metrics developed by the FAIRsFAIR project. The service will be applied to demonstrate the evaluation of objects in repositories selected for in-depth collaboration with the project.

The 'F' stands for FAIR (of course) and 'UJI' means 'Test' in Malay. So F-UJI is a FAIR testing tool.

Cite as

Devaraju, A. and Huber, R. (2021). An automated solution for measuring the progress toward FAIR research data. Patterns, vol 2(11), https://doi.org/10.1016/j.patter.2021.100370

Clients and User Interface

A web demo using F-UJI is available at https://www.f-uji.net.

An R client package that was generated from the F-UJI OpenAPI definition is available from https://github.com/NFDI4Chem/rfuji.

An open source web client for F-UJI is available at https://github.com/MaastrichtU-IDS/fairificator.

Assessment Scope, Constraint and Limitation

The service is in development and its assessment depends on several factors. - In the FAIR ecosystem, FAIR assessment must go beyond the object itself. FAIR enabling services and repositories are vital to ensure that research data objects remain FAIR over time. Importantly, machine-readable services (e.g., registries) and documents (e.g., policies) are required to enable automated tests. - In addition to repository and services requirements, automated testing depends on clear machine assessable criteria. Some aspects (rich, plurality, accurate, relevant) specified in FAIR principles still require human mediation and interpretation. - The tests must focus on generally applicable data/metadata characteristics until domain/community-driven criteria have been agreed (e.g., appropriate schemas and required elements for usage/access control, etc.). For example, for some metrics (i.e., on I and R principles), the automated tests we proposed only inspect the ‘surface’ of criteria to be evaluated. Therefore, tests are designed in consideration of generic cross-domain metadata standards such as Dublin Core, DCAT, DataCite, schema.org, etc. - FAIR assessment is performed based on aggregated metadata; this includes metadata embedded in the data (landing) page, metadata retrieved from a PID provider (e.g., DataCite content negotiation) and other services (e.g., re3data).

alt text

Requirements

Python 3.11

Google Dataset Search

Download the latest Dataset Search corpus file from: https://www.kaggle.com/googleai/dataset-search-metadata-for-datasets
Open file fuji_server/helper/create_google_cache_db.py and set variable 'googlefilelocation' according to the file location of the corpus file
Run create_google_cache_db.py which creates a SQLite database in the data directory. From root directory run python3 -m fuji_server.helper.create_google_cache_db.

The service was generated by the swagger-codegen project. By using the OpenAPI-Spec from a remote server, you can easily generate a server stub. The service uses the Connexion library on top of Flask.

Usage

Before running the service, please set user details in the configuration file, see config/users.py.

To install F-UJI, you may execute the following Python-based or docker-based installation commands from the root directory:

Python module-based installation

From the fuji source folder run: bash python -m pip install . The F-UJI server can now be started with: bash python -m fuji_server -c fuji_server/config/server.ini

The OpenAPI user interface is then available at http://localhost:1071/fuji/api/v1/ui/.

Docker-based installation

bash docker run -d -p 1071:1071 ghcr.io/pangaea-data-publisher/fuji

To access the OpenAPI user interface, open the URL below in the browser: http://localhost:1071/fuji/api/v1/ui/

Your OpenAPI definition lives here:

http://localhost:1071/fuji/api/v1/openapi.json

You can provide a different server config file this way:

bash docker run -d -p 1071:1071 -v server.ini:/usr/src/app/fuji_server/config/server.ini ghcr.io/pangaea-data-publisher/fuji

You can also build the docker image from the source code:

bash docker build -t <tag_name> . docker run -d -p 1071:1071 <tag_name>

Notes

To avoid Tika startup warning message, set environment variable TIKA_LOG_PATH. For more information, see https://github.com/chrismattmann/tika-python

If you receive the exception urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] on macOS, run the install command shipped with Python: ./Install\ Certificates.command.

F-UJI is using basic authentication, so username and password have to be provided for each REST call which can be configured in fuji_server/config/users.py.

Development

First, make sure to read the contribution guidelines. They include instructions on how to set up your environment with pre-commit and how to run the tests.

The repository includes a simple web client suitable for interacting with the API during development. One way to run it would be with a LEMP stack (Linux, Nginx, MySQL, PHP), which is described in the following.

First, install the necessary packages:

bash sudo apt-get update sudo apt-get install nginx sudo ufw allow 'Nginx HTTP' sudo service mysql start # expects that mysql is already installed, if not run sudo apt install mysql-server sudo service nginx start sudo apt install php8.1-fpm php-mysql sudo apt install php8.1-curl sudo phpenmod curl

Next, configure the service by running sudo vim /etc/nginx/sites-available/fuji-dev and paste:

```php server { listen 9000; server_name fuji-dev; root /var/www/fuji-dev;

index index.php;

location / {
    try_files $uri $uri/ =404;
}

location ~ \.php$ {
    include snippets/fastcgi-php.conf;
    fastcgi_pass unix:/var/run/php/php8.1-fpm.sock;
 }

location ~ /\.ht {
    deny all;
}

} ```

Link simpleclient/index.php and simpleclient/icons/ to /var/www/fuji-dev by running sudo ln <path_to_fuji>/fuji/simpleclient/* /var/www/fuji-dev/. You might need to adjust the file permissions to allow non-root writes.

Next, bash sudo ln -s /etc/nginx/sites-available/fuji-dev /etc/nginx/sites-enabled/ sudo nginx -t sudo service nginx reload sudo service php8.1-fpm start

The web client should now be available at http://localhost:9000/. Make sure to adjust the username and password in simpleclient/index.php.

After a restart, it may be necessary to start the services again:

bash sudo service php8.1-fpm start sudo service nginx start python -m fuji_server -c fuji_server/config/server.ini

Component interaction (walkthrough)

This walkthrough can guide you through the comprehensive codebase.

A good starting point is fair_object_controller/assess_by_id. Here, we create a FAIRCheck object called ft. This reads the metrics YAML file during initialisation and will provide all the check methods.

Next, several harvesting methods are called, first harvest_all_metadata, followed by harvest_re3_data (Datacite) and harvest_github and finally harvest_all_data. The harvesters are implemented separately in harvester/, and each of them collects different kinds of data. This is regardless of the defined metrics, the harvesters always run. - The metadata harvester looks through HTML markup following schema.org, Dublincore etc., through signposting/typed links. Ideally, it can find things like author information or license names that way. - The data harvester is only run if the metadata harvester finds an object_content_identifier pointing at content files. Then, the data harvester runs over the files and checks things like the file format. - The Github harvester connects with the GitHub API to retrieve metadata and data from software repositories. It relies on an access token being defined in config/github.cfg.

After harvesting, all evaluators are called. Each specific evaluator, e.g. FAIREvaluatorLicense, is associated with a specific FsF and/or FAIR4RS metric. Before the evaluator runs any checks on the harvested data, it asserts that its associated metric is listed in the metrics YAML file. Only if it is, the evaluator runs through and computes a local score.

In the end, all scores are aggregated into F, A, I, R scores.

Adding support for new metrics

Start by adding a new metrics YAML file in yaml/. Its name has to match the following regular expression: (metrics_v)?([0-9]+\.[0-9]+)(_[a-z]+)?(\.yaml), and the content should be structured similarly to the existing metric files.

Metric names are tested for validity using regular expressions throughout the code. If your metric names do not match those, not all components of the tool will execute as expected, so make sure to adjust the expressions. Regular expression groups are also used for mapping to F, A, I, R categories for scoring, and debug messages are only displayed if they are associated with a valid metric.

Evaluators are mapped to metrics in their __init__ methods, so adjust existing evaluators to associate with your metric as well or define new evaluators if needed. The multiple test methods within an evaluator also check whether their specific test is defined. FAIREvaluatorLicense is an example of an evaluator corresponding to metrics from different sources.

For each metric, the maturity is determined as the maximum of the maturity associated with each passed test. This means that if a test indicating maturity 3 is passed and one indicating maturity 2 is not passed, the metric will still be shown to be fulfilled with maturity 3.

Updates to the API

Making changes to the API requires re-generating parts of the code using Swagger. First, edit fuji_server/yaml/openapi.yaml. Then, use the Swagger Editor to generate a python-flask server. The zipped files should be automatically downloaded. Unzip it.

Next: 1. Place the files in swagger_server/models into fuji_server/models, except swagger_server/models/__init__.py. 2. Rename all occurrences of swagger_server to fuji_server. 3. Add the content of swagger_server/models/__init__.py into fuji_server/__init__.py.

Unfortunately, the Swagger Editor doesn't always produce code that is compliant with PEP standards. Run pre-commit run (or try to commit) and fix any errors that cannot be automatically fixed.

License

This project is licensed under the MIT License; for more details, see the LICENSE file.

Acknowledgements

F-UJI is a result of the FAIRsFAIR “Fostering FAIR Data Practices In Europe” project which received funding from the European Union’s Horizon 2020 project call H2020-INFRAEOSC-2018-2020 (grant agreement 831558).

The project was also supported through our contributors by the Helmholtz Metadata Collaboration (HMC), an incubator-platform of the Helmholtz Association within the framework of the Information and Data Science strategic initiative.

Owner

Name: FAIR-IMPACT
Login: FAIR-IMPACT
Kind: organization
Email: pco@fair-impact.eu
Location: Netherlands

Website: fair-impact.eu
Twitter: fairimpact_eu
Repositories: 8
Profile: https://github.com/FAIR-IMPACT

GitHub organisation for the Horizon Europe EOSC project FAIR-IMPACT (Expanding FAIR solutions across EOSC)

Citation (CITATION.cff)

cff-version: 1.2.0
title: F-UJI - An Automated FAIR Data Assessment Tool
message: Please cite this software using these metadata.
type: software
authors:
  - given-names: Anusuriya
    family-names: Devaraju
    email: anusuriya.devaraju@googlemail.com
    orcid: 'https://orcid.org/0000-0003-0870-3192'
  - given-names: Robert
    family-names: Huber
    email: rhuber@marum.de
    orcid: 'https://orcid.org/0000-0003-3000-0020'
identifiers:
  - type: doi
    value: 10.5281/zenodo.3934401
repository-code: 'https://github.com/pangaea-data-publisher/fuji'
url: >-
  https://www.fairsfair.eu/f-uji-automated-fair-data-assessment-tool
abstract: >-
  FAIRsFAIR has developed F-UJI, a service based on
  REST, and is piloting a programmatic assessment of
  the FAIRness of research datasets in five
  trustworthy data repositories.
keywords:
  - PANGAEA
  - FAIRsFAIR
  - FAIR Principles
  - Data Object Assessment
  - OpenAPI
  - FAIR
  - Research Data
  - FAIR data
  - Metadata harvesting
license: MIT

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/ci.yml actions

JamesIves/github-pages-deploy-action 65b5dfd4f5bcd3a7403bbc2959c144256167464e composite
actions/cache v4 composite
actions/checkout v4 composite
actions/setup-python v5 composite
actions/upload-artifact v4 composite

.github/workflows/publish-docker.yml actions

actions/checkout v4 composite

.github/workflows/reports.yml actions

EnricoMi/publish-unit-test-result-action e780361cd1fc1b1a170624547b3ffda64787d365 composite
dawidd6/action-download-artifact e7466d1a7587ed14867642c2ca74b5bcc1e19a2d composite
irongut/CodeCoverageSummary 51cc3a756ddcd398d447c044c02cb6aa83fdae95 composite
marocchino/sticky-pull-request-comment 331f8f5b4215f0445d3c07b4967662a32a2d3e31 composite

Dockerfile docker

python 3.11-slim build

docker-compose.yml docker

ghcr.io/pangaea-data-publisher/fuji latest
jupyter/minimal-notebook latest

pyproject.toml pypi

beautifulsoup4 ~=4.12
configparser ~=6.0
connexion [flask,uvicorn,swagger-ui]~=3.0
extruct ~=0.16.0
feedparser ~=6.0
flask-cors ~=4.0
flask-limiter ~=3.5
hashid ~=3.1.4
idutils ~=1.2
jmespath ~=1.0
levenshtein ~=0.24.0
lxml ~=5.0
pandas ~=2.1
pyRdfa3 ~=3.5
pygithub ~=2.1
pyld ~=2.0
pyyaml ~=6.0
rapidfuzz ~=3.3
rdflib ~=7.0
requests ~=2.31
sparqlwrapper ~=2.0
tika ~=2.6
tldextract ~=5.0
urlextract ~=1.8