fair_stats_aggregator

This repository hosts a prototype tool designed to analyze and aggregate FAIR (Findable, Accessible, Interoperable, and Reusable) statistics for a list of Digital Object Identifiers (DOIs).

https://github.com/saibotmagd/fair_stats_aggregator

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

fair fair-data nfdi nfdi4bioimage
Last synced: 6 months ago · JSON representation

Repository

This repository hosts a prototype tool designed to analyze and aggregate FAIR (Findable, Accessible, Interoperable, and Reusable) statistics for a list of Digital Object Identifiers (DOIs).

Basic Info
  • Host: GitHub
  • Owner: SaibotMagd
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 10.8 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
fair fair-data nfdi nfdi4bioimage
Created about 1 year ago · Last pushed 11 months ago
Metadata Files
Readme Codemeta

README.md

Lin_X_NFDI4BIOIMAGE

FAIR Statistics Aggregator for DOIs

Table of Contents

  1. Introduction
  2. Features
  3. Requirements
  4. Installation
  5. Usage
  6. Output
  7. Limitations
  8. License

Introduction

*** The development of this project is being evaluated, as no relevant benefits or user interest have been identified to date. ***

This repository hosts a prototype tool designed to analyze and aggregate FAIR (Findable, Accessible, Interoperable, and Reusable) statistics for a list of Digital Object Identifiers (DOIs). The tool currently utilizes the F-UJI FAIR checker to evaluate the FAIRness of the metadata associated with each DOI. Future versions aim to incorporate additional FAIR checkers to provide a more comprehensive analysis.

The tool processes a list of DOIs, which can be sourced from a website or fetched using a metasearch API like Crossref or DataCite (an example implementation is provided that generates the list of doi's for an institute via openalex query). It calculates FAIR statistics for each DOI, aggregates these statistics by publication year, and identifies common metadata errors that impact FAIRness. The results are presented in an aggregated FAIR-statistic per publication year diagram and a summary of the most frequent metadata issues.

This tool also serves as a justification for metadata providers (e.g., Springer, Nature) to ensure their metadata is hosted in a machine-readable format, as this is crucial for optimal FAIRness evaluation.

Warning: The F-UJI FAIR checker must be initialized beforehand using a Docker container. Instructions for setting up the F-UJI checker can be found here. Please note that F-UJI and other FAIR checkers are in a very early beta status.

Features

  • DOI List Processing: Accepts a list of DOIs from a file or fetched via APIs like Crossref or DataCite.
  • FAIR Evaluation: Uses the F-UJI FAIR checker to evaluate the FAIRness of each DOI's metadata.
  • Aggregation: Aggregates FAIR statistics by publication year.
  • Error Summary: Identifies and summarizes the most common metadata errors affecting FAIRness.
  • Visualization: Generates an aggregated FAIR-statistic per publication year diagram.

Requirements

  • Python 3.x
  • Docker (for running the F-UJI FAIR checker)
  • Required Python packages (listed in requirements.txt)

Installation

  1. Clone the repository: bash git clone https://github.com/saibotmagd/fair_stats_aggregator.git cd fair_stats_aggregator
  2. Install the required Python packages: bash pip install -r requirements.txt
  3. Set up the F-UJI FAIR checker (https://github.com/FAIR-IMPACT/fuji) using Docker: bash docker pull fairimpact/fuji docker run -d -p 1071:1071 fairimpact/fuji

Usage

  1. Prepare a list of DOIs in a text file (one DOI per line) or use an API to fetch DOIs.
  • you can use the openAlex API to get a DOI-list file for your institute by running: bash python get_dois_from_openalex_id.py {openalex-institutions-id} {outputfile.txt}
  • Run the tool: bash python fair_stats_agg.py --doi-file path/to/doi_list.txt We provide sample doi_list-files for the NFDI4bioimage partner institutes the '/data/folder' (Uploaded 3/2025)
  1. The tool will output the aggregated FAIR statistics and a summary of metadata errors, as

    • '.png' files (diagramms of aggregated FAIR statistic),
    • '.csv' files (aggregated sources of errors in FAIR statistics, issues of the publishing websites)
  2. You can create a full summary as a markdown-file for every doilist processed in one file with: ```bash python createfullsummary.py ./data/ examplesummary.md ``` Checkout the full summary (will not load via github in the browser because of the file size), so check out the summaries per institute (top 10000 most frequent publications):

    PS: yes there're papers included published in the 18./19. century (obivously wrong entries)

Output

  • Aggregated FAIR-statistic per Publication Year Diagram: A visual representation of FAIR statistics aggregated by publication year.
  • Metadata Error Summary: A list of the most common metadata errors affecting FAIRness.
  • Justification for Metadata Providers: A summary highlighting the importance of machine-readable metadata for optimal FAIRness evaluation.

Limitations

  • Beta Status: The F-UJI FAIR checker and other FAIR checkers are in a very early beta status. Results may vary and should be interpreted with caution.
  • Dependency on Docker: The F-UJI FAIR checker requires Docker to be initialized beforehand.

License

CC BY-NC License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

You are free to:

  • Share — Copy and redistribute the material in any medium or format.
  • Adapt — Remix, transform, and build upon the material.

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • NonCommercial — You may not use the material for commercial purposes.

Notices:

  • You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
  • No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

For more details, please refer to the full license text: CC BY-NC 4.0 License.


Note: If you intend to use this work for commercial purposes, please contact the author for alternative licensing options.

Back to Top

Owner

  • Login: SaibotMagd
  • Kind: user
  • Company: https://www.lin-magdeburg.de/

CodeMeta (codemeta.json)

{
  "@context": "https://w3id.org/codemeta/3.0",
  "type": "SoftwareSourceCode",
  "codeRepository": "https://github.com/SaibotMagd/fair_stats_aggregator",
  "dateCreated": "2025-02-01",
  "datePublished": "2025-02-01",
  "description": "This repository hosts a prototype tool designed to analyze and aggregate FAIR (Findable, Accessible, Interoperable, and Reusable) statistics for a list of Digital Object Identifiers (DOIs). The tool currently utilizes the F-UJI FAIR checker to evaluate the FAIRness of the metadata associated with each DOI. Future versions aim to incorporate additional FAIR checkers to provide a more comprehensive analysis.",
  "funder": {
    "type": "Organization",
    "name": "nfdi4bioimage"
  },
  "keywords": [
    "nfdi4bioimage",
    "nfdi",
    "fair",
    "fair_principles",
    "metadata",
    "statistic"
  ],
  "license": "https://spdx.org/licenses/CC-BY-NC-1.0",
  "name": "fair_stats_aggregator",
  "operatingSystem": [
    "Linux",
    "Windows"
  ],
  "programmingLanguage": "Python",
  "relatedLink": "https://nfdi4bioimage.de",
  "runtimePlatform": [
    "pip",
    "anaconda"
  ],
  "softwareRequirements": [
    "requests",
    "pandas",
    "matplotlib",
    "tqdm"
  ],
  "issueTracker": "https://github.com/SaibotMagd/fair_stats_aggregator/issues"
}

GitHub Events

Total
  • Release event: 1
  • Push event: 9
  • Public event: 1
  • Create event: 2
Last Year
  • Release event: 1
  • Push event: 9
  • Public event: 1
  • Create event: 2

Dependencies

environment.yml pypi
requirements.txt pypi
  • matplotlib *
  • pandas *
  • requests *
  • tqdm *