fuji4software
FAIRsFAIR Research Data Object Assessment Service as extended in the FAIR-IMPACT milestone M5.6
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 10 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Repository
FAIRsFAIR Research Data Object Assessment Service as extended in the FAIR-IMPACT milestone M5.6
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
F-UJI (FAIRsFAIR Research Data Object Assessment Service)
This repository contains the extension of F-UJI for FAIR4RS metrics, carried out for milestone M5.6, which comes from Task 5.2 (FAIR metrics for research software) on "Practical tests for automated FAIR software assessment in a disciplinary context" and is part of Work Package 5 on "Metrics, Certification and Guidelines" within the FAIR-IMPACT project. The base repository is also available on GitHub. Any contributions should be made to their repository. Evaluation scripts and results are included in ./evaluation/. The milestone report describing this development will be published shortly under the DOI displayed below.
Orginal README for F-UJI
Developers: Robert Huber, Anusuriya Devaraju
Thanks to Heinz-Alexander Fuetterer for his contributions and his help in cleaning up the code.
Overview
F-UJI is a web service to programmatically assess FAIRness of research data objects based on metrics developed by the FAIRsFAIR project. The service will be applied to demonstrate the evaluation of objects in repositories selected for in-depth collaboration with the project.
The 'F' stands for FAIR (of course) and 'UJI' means 'Test' in Malay. So F-UJI is a FAIR testing tool.
Cite as
Devaraju, A. and Huber, R. (2021). An automated solution for measuring the progress toward FAIR research data. Patterns, vol 2(11), https://doi.org/10.1016/j.patter.2021.100370
Clients and User Interface
A web demo using F-UJI is available at https://www.f-uji.net.
An R client package that was generated from the F-UJI OpenAPI definition is available from https://github.com/NFDI4Chem/rfuji.
An open source web client for F-UJI is available at https://github.com/MaastrichtU-IDS/fairificator.
Assessment Scope, Constraint and Limitation
The service is in development and its assessment depends on several factors. - In the FAIR ecosystem, FAIR assessment must go beyond the object itself. FAIR enabling services and repositories are vital to ensure that research data objects remain FAIR over time. Importantly, machine-readable services (e.g., registries) and documents (e.g., policies) are required to enable automated tests. - In addition to repository and services requirements, automated testing depends on clear machine assessable criteria. Some aspects (rich, plurality, accurate, relevant) specified in FAIR principles still require human mediation and interpretation. - The tests must focus on generally applicable data/metadata characteristics until domain/community-driven criteria have been agreed (e.g., appropriate schemas and required elements for usage/access control, etc.). For example, for some metrics (i.e., on I and R principles), the automated tests we proposed only inspect the ‘surface’ of criteria to be evaluated. Therefore, tests are designed in consideration of generic cross-domain metadata standards such as Dublin Core, DCAT, DataCite, schema.org, etc. - FAIR assessment is performed based on aggregated metadata; this includes metadata embedded in the data (landing) page, metadata retrieved from a PID provider (e.g., DataCite content negotiation) and other services (e.g., re3data).

Requirements
Python 3.11
Google Dataset Search
- Download the latest Dataset Search corpus file from: https://www.kaggle.com/googleai/dataset-search-metadata-for-datasets
- Open file
fuji_server/helper/create_google_cache_db.pyand set variable 'googlefilelocation' according to the file location of the corpus file - Run
create_google_cache_db.pywhich creates a SQLite database in the data directory. From root directory runpython3 -m fuji_server.helper.create_google_cache_db.
The service was generated by the swagger-codegen project. By using the OpenAPI-Spec from a remote server, you can easily generate a server stub. The service uses the Connexion library on top of Flask.
Usage
Before running the service, please set user details in the configuration file, see config/users.py.
To install F-UJI, you may execute the following Python-based or docker-based installation commands from the root directory:
Python module-based installation
From the fuji source folder run:
bash
python -m pip install .
The F-UJI server can now be started with:
bash
python -m fuji_server -c fuji_server/config/server.ini
The OpenAPI user interface is then available at http://localhost:1071/fuji/api/v1/ui/.
Docker-based installation
bash
docker run -d -p 1071:1071 ghcr.io/pangaea-data-publisher/fuji
To access the OpenAPI user interface, open the URL below in the browser: http://localhost:1071/fuji/api/v1/ui/
Your OpenAPI definition lives here:
http://localhost:1071/fuji/api/v1/openapi.json
You can provide a different server config file this way:
bash
docker run -d -p 1071:1071 -v server.ini:/usr/src/app/fuji_server/config/server.ini ghcr.io/pangaea-data-publisher/fuji
You can also build the docker image from the source code:
bash
docker build -t <tag_name> .
docker run -d -p 1071:1071 <tag_name>
Notes
To avoid Tika startup warning message, set environment variable TIKA_LOG_PATH. For more information, see https://github.com/chrismattmann/tika-python
If you receive the exception urllib2.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] on macOS, run the install command shipped with Python:
./Install\ Certificates.command.
F-UJI is using basic authentication, so username and password have to be provided for each REST call which can be configured in fuji_server/config/users.py.
Development
First, make sure to read the contribution guidelines.
They include instructions on how to set up your environment with pre-commit and how to run the tests.
The repository includes a simple web client suitable for interacting with the API during development. One way to run it would be with a LEMP stack (Linux, Nginx, MySQL, PHP), which is described in the following.
First, install the necessary packages:
bash
sudo apt-get update
sudo apt-get install nginx
sudo ufw allow 'Nginx HTTP'
sudo service mysql start # expects that mysql is already installed, if not run sudo apt install mysql-server
sudo service nginx start
sudo apt install php8.1-fpm php-mysql
sudo apt install php8.1-curl
sudo phpenmod curl
Next, configure the service by running sudo vim /etc/nginx/sites-available/fuji-dev and paste:
```php server { listen 9000; server_name fuji-dev; root /var/www/fuji-dev;
index index.php;
location / {
try_files $uri $uri/ =404;
}
location ~ \.php$ {
include snippets/fastcgi-php.conf;
fastcgi_pass unix:/var/run/php/php8.1-fpm.sock;
}
location ~ /\.ht {
deny all;
}
} ```
Link simpleclient/index.php and simpleclient/icons/ to /var/www/fuji-dev by running sudo ln <path_to_fuji>/fuji/simpleclient/* /var/www/fuji-dev/. You might need to adjust the file permissions to allow non-root writes.
Next,
bash
sudo ln -s /etc/nginx/sites-available/fuji-dev /etc/nginx/sites-enabled/
sudo nginx -t
sudo service nginx reload
sudo service php8.1-fpm start
The web client should now be available at http://localhost:9000/. Make sure to adjust the username and password in simpleclient/index.php.
After a restart, it may be necessary to start the services again:
bash
sudo service php8.1-fpm start
sudo service nginx start
python -m fuji_server -c fuji_server/config/server.ini
Component interaction (walkthrough)
This walkthrough can guide you through the comprehensive codebase.
A good starting point is fair_object_controller/assess_by_id.
Here, we create a FAIRCheck object called ft.
This reads the metrics YAML file during initialisation and will provide all the check methods.
Next, several harvesting methods are called, first harvest_all_metadata, followed by harvest_re3_data (Datacite) and harvest_github and finally harvest_all_data.
The harvesters are implemented separately in harvester/, and each of them collects different kinds of data.
This is regardless of the defined metrics, the harvesters always run.
- The metadata harvester looks through HTML markup following schema.org, Dublincore etc., through signposting/typed links.
Ideally, it can find things like author information or license names that way.
- The data harvester is only run if the metadata harvester finds an object_content_identifier pointing at content files.
Then, the data harvester runs over the files and checks things like the file format.
- The Github harvester connects with the GitHub API to retrieve metadata and data from software repositories.
It relies on an access token being defined in config/github.cfg.
After harvesting, all evaluators are called.
Each specific evaluator, e.g. FAIREvaluatorLicense, is associated with a specific FsF and/or FAIR4RS metric.
Before the evaluator runs any checks on the harvested data, it asserts that its associated metric is listed in the metrics YAML file.
Only if it is, the evaluator runs through and computes a local score.
In the end, all scores are aggregated into F, A, I, R scores.
Adding support for new metrics
Start by adding a new metrics YAML file in yaml/.
Its name has to match the following regular expression: (metrics_v)?([0-9]+\.[0-9]+)(_[a-z]+)?(\.yaml),
and the content should be structured similarly to the existing metric files.
Metric names are tested for validity using regular expressions throughout the code. If your metric names do not match those, not all components of the tool will execute as expected, so make sure to adjust the expressions. Regular expression groups are also used for mapping to F, A, I, R categories for scoring, and debug messages are only displayed if they are associated with a valid metric.
Evaluators are mapped to metrics in their __init__ methods, so adjust existing evaluators to associate with your metric as well or define new evaluators if needed.
The multiple test methods within an evaluator also check whether their specific test is defined.
FAIREvaluatorLicense is an example of an evaluator corresponding to metrics from different sources.
For each metric, the maturity is determined as the maximum of the maturity associated with each passed test. This means that if a test indicating maturity 3 is passed and one indicating maturity 2 is not passed, the metric will still be shown to be fulfilled with maturity 3.
Updates to the API
Making changes to the API requires re-generating parts of the code using Swagger.
First, edit fuji_server/yaml/openapi.yaml.
Then, use the Swagger Editor to generate a python-flask server.
The zipped files should be automatically downloaded.
Unzip it.
Next:
1. Place the files in swagger_server/models into fuji_server/models, except swagger_server/models/__init__.py.
2. Rename all occurrences of swagger_server to fuji_server.
3. Add the content of swagger_server/models/__init__.py into fuji_server/__init__.py.
Unfortunately, the Swagger Editor doesn't always produce code that is compliant with PEP standards.
Run pre-commit run (or try to commit) and fix any errors that cannot be automatically fixed.
License
This project is licensed under the MIT License; for more details, see the LICENSE file.
Acknowledgements
F-UJI is a result of the FAIRsFAIR “Fostering FAIR Data Practices In Europe” project which received funding from the European Union’s Horizon 2020 project call H2020-INFRAEOSC-2018-2020 (grant agreement 831558).
The project was also supported through our contributors by the Helmholtz Metadata Collaboration (HMC), an incubator-platform of the Helmholtz Association within the framework of the Information and Data Science strategic initiative.
Owner
- Name: FAIR-IMPACT
- Login: FAIR-IMPACT
- Kind: organization
- Email: pco@fair-impact.eu
- Location: Netherlands
- Website: fair-impact.eu
- Twitter: fairimpact_eu
- Repositories: 8
- Profile: https://github.com/FAIR-IMPACT
GitHub organisation for the Horizon Europe EOSC project FAIR-IMPACT (Expanding FAIR solutions across EOSC)
Citation (CITATION.cff)
cff-version: 1.2.0
title: F-UJI - An Automated FAIR Data Assessment Tool
message: Please cite this software using these metadata.
type: software
authors:
- given-names: Anusuriya
family-names: Devaraju
email: anusuriya.devaraju@googlemail.com
orcid: 'https://orcid.org/0000-0003-0870-3192'
- given-names: Robert
family-names: Huber
email: rhuber@marum.de
orcid: 'https://orcid.org/0000-0003-3000-0020'
identifiers:
- type: doi
value: 10.5281/zenodo.3934401
repository-code: 'https://github.com/pangaea-data-publisher/fuji'
url: >-
https://www.fairsfair.eu/f-uji-automated-fair-data-assessment-tool
abstract: >-
FAIRsFAIR has developed F-UJI, a service based on
REST, and is piloting a programmatic assessment of
the FAIRness of research datasets in five
trustworthy data repositories.
keywords:
- PANGAEA
- FAIRsFAIR
- FAIR Principles
- Data Object Assessment
- OpenAPI
- FAIR
- Research Data
- FAIR data
- Metadata harvesting
license: MIT
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- JamesIves/github-pages-deploy-action 65b5dfd4f5bcd3a7403bbc2959c144256167464e composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- actions/checkout v4 composite
- EnricoMi/publish-unit-test-result-action e780361cd1fc1b1a170624547b3ffda64787d365 composite
- dawidd6/action-download-artifact e7466d1a7587ed14867642c2ca74b5bcc1e19a2d composite
- irongut/CodeCoverageSummary 51cc3a756ddcd398d447c044c02cb6aa83fdae95 composite
- marocchino/sticky-pull-request-comment 331f8f5b4215f0445d3c07b4967662a32a2d3e31 composite
- python 3.11-slim build
- ghcr.io/pangaea-data-publisher/fuji latest
- jupyter/minimal-notebook latest
- beautifulsoup4 ~=4.12
- configparser ~=6.0
- connexion [flask,uvicorn,swagger-ui]~=3.0
- extruct ~=0.16.0
- feedparser ~=6.0
- flask-cors ~=4.0
- flask-limiter ~=3.5
- hashid ~=3.1.4
- idutils ~=1.2
- jmespath ~=1.0
- levenshtein ~=0.24.0
- lxml ~=5.0
- pandas ~=2.1
- pyRdfa3 ~=3.5
- pygithub ~=2.1
- pyld ~=2.0
- pyyaml ~=6.0
- rapidfuzz ~=3.3
- rdflib ~=7.0
- requests ~=2.31
- sparqlwrapper ~=2.0
- tika ~=2.6
- tldextract ~=5.0
- urlextract ~=1.8