https://github.com/cyfronet-fid/transform-service

https://github.com/cyfronet-fid/transform-service

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: cyfronet-fid
  • License: gpl-3.0
  • Language: Python
  • Default Branch: development
  • Size: 499 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 37
  • Releases: 4
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme License

README.rst

.. contents:: Table of Contents
   :local:

Introduction
============

The EOSC Data Transform Service is a service that supplies the EOSC Search Service portal with data from various sources. The data is collected, transformed to meet our requirements, and then sent to external services such as Solr and Amazon S3 Cloud.

The data obtained from APIs includes ``services, data sources, providers, offers, bundles, trainings, interoperability guidelines``. These data are updated in real-time, but there is also a possibility of updating all records.

The data obtained from dumps includes ``publications, datasets, software, other research products, organizations, and projects``. Live updates are not available, only batch updates.

Documentation
=============
The service uses Sphinx for generating both local and public documentation. Follow the instructions below to access the documentation.

Public Documentation
---------------------
The public documentation for the EOSC Data Transform Service is available online at `Read the Docs `_.
This should be your first point of reference for detailed information about the service.

Local Sphinx Documentation
---------------------------
You can generate and view the Sphinx documentation locally by running the following command in the docs directory:

.. code-block:: shell

   make html

Once generated, the documentation will be available at `docs/build/html/index.html`. Open it in a browser to navigate the API, Schemas and other documentation.

To remove old build files and ensure a fresh documentation generation, use the following command before running `make html`:

.. code-block:: shell

   make clean

This will delete the `docs/build/` directory, allowing Sphinx to regenerate all files from scratch.

API
===

Transform Endpoints
-------------------

- ``/batch`` - handles a live update. One or more resources per request.
- ``/full`` - handles an update of the whole data collection.
- ``/dump`` - handles a dump update to create a single data iteration.

Solr Manipulation Endpoints
---------------------------

- ``/create_collections`` - creates all necessary Solr collections for a single data iteration.
- ``/create_aliases`` - creates aliases for all collections from a single data iteration.
- ``/delete_collections`` - deletes all collections from a single data iteration.

Deployment
==========

1. Get Solr instance and/or Amazon S3 bucket.
2. Adjust ``docker-compose.yml`` to your requirements.
3. Set ``.env`` variables.
4. Deployment is simple and easy. Type:

.. code-block:: shell

    docker-compose up -d --build
    docker-compose up

Dependencies
------------

- ``Solr`` instance (optional) **and/or** ``Amazon S3 cloud`` (optional). At least one of them is necessary.

ENV variables
-------------

We are using ``.env`` (in the root of the EOSC Transform Service) to store user-specific constants. Details:

General
^^^^^^^
- ``ENVIRONMENT``: ``Literal["dev", "test", "production"] = "dev"`` - Choose environment in which you want to work in.
- ``LOG_LEVEL``: ``str = "info"`` - Logging level.
- ``SENTRY_DSN`` - endpoint for Sentry logged errors. For development leave this variable unset.

Services
^^^^^^^^
Solr
----
- ``SOLR_URL``: ``AnyUrl = "http://localhost:8983/solr/"`` - Solr address.
- ``SOLR_COLS_PREFIX``: ``str = ""`` - The prefix of the Solr collections to which data will be sent.

S3
--
- ``S3_ACCESS_KEY``: ``str = ""`` - Your S3 access key with write permissions.
- ``S3_SECRET_KEY``: ``str = ""`` - Your S3 secret key with write permissions.
- ``S3_ENDPOINT``: ``str = ""`` - S3 endpoint. Example: ``https://s3.cloud.com``.
- ``S3_BUCKET``: ``str = ""`` - S3 bucket. Example: ``ess-mock-dumps``.

STOMP (JMS)
-----
- ``STOMP_SUBSCRIPTION``: ``bool = True`` - Subscribe to JMS?
    - ``STOMP_HOST``: ``str = "127.0.0.1"`` - The hostname or IP address of the STOMP broker.
    - ``STOMP_PORT``: ``int = 61613``- The port on which the STOMP broker is listening.
    - ``STOMP_LOGIN``: ``str = "guest"`` - The username for connecting to the STOMP broker.
    - ``STOMP_PASS``: ``str = "guest"``- The password for connecting to the STOMP broker.
    - ``STOMP_CLIENT_NAME``: ``str = "transformer-client"`` - A name to identify this STOMP client instance.
    - ``STOMP_SSL``: ``bool = False`` - Set to ``True`` to enable SSL for the STOMP connection. Ensure SSL certificates are properly configured if this is enabled.
    - ``STOMP_TOPIC_PREFIX``: ``str = ""`` - Prefix that is added to STOMP base topics. E.g. "adapter.update" -> "beta.adapter.update".

Sources of Data
^^^^^^^^^^^^^^^
Local Data Dump
---------------

- ``DATASET_PATH``: ``str`` - A path to datasets **directory**.
- ``PUBLICATION_PATH``: ``str`` - A path to publications **directory**.
- ``SOFTWARE_PATH``: ``str`` - A path to software **directory**.
- ``OTHER_RP_PATH``: ``str`` - A path to other research products **directory**.
- ``ORGANISATION_PATH``: ``str`` - A path to organisation **directory**.
- ``PROJECT_PATH``: ``str`` - A path to project **directory**.

Relations
---------

- ``RES_ORG_REL_PATH``: ``str`` - A path to resultOrganization **directory**.
- ``RES_PROJ_REL_PATH``: ``str`` - A path to resultProject **directory**.
- ``ORG_PROJ_REL_PATH``: ``str`` - A path to organizationProject **directory**.

Data from API
-------------
Marketplace
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- ``MP_API_ADDRESS``: ``AnyUrl = "https://marketplace.sandbox.eosc-beyond.eu"`` - A Marketplace API address.
- ``MP_API_TOKEN``: ``str`` - An authorization token for the Marketplace API.

Provider Component
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- ``GUIDELINE_ADDRESS``: ``AnyUrl = "https://integration.providers.sandbox.eosc-beyond.eu/api/public/interoperabilityRecord/all?catalogue_id=all&active=true&suspended=false&quantity=10000"`` - A full address to get all interoperability guidelines **endpoint**.
- ``TRAINING_ADDRESS``: ``AnyUrl = "https://integration.providers.sandbox.eosc-beyond.eu/api/public/trainingResource/all?catalogue_id=all&active=true&suspended=false&quantity=10000"`` - A full address to get all trainings **endpoint**.
- ``ADAPTER_ADDRESS``: ``AnyUrl = "https://integration.providers.sandbox.eosc-beyond.eu/api/public/adapter/all?active=true&suspended=false&quantity=10000"`` - A full address to get all adapters **endpoint**.
- ``NODE_ADDRESS``: ``AnyUrl = "https://integration.providers.sandbox.eosc-beyond.eu/api/vocabulary/byType/NODE"`` - Get all nodes for OAG mapping.
Authentication (Optional)
"""""""""""""""""""""""""""""""""""""""""""
If the target endpoint requires authentication, the following settings can be used to enable token-based access:

- ``PC_AUTH``: ``bool = False``
  Enables or disables authentication. Set to `True` to retrieve a bearer token before calling the API.

- ``PC_REFRESH_TOKEN``: ``str = ""``
  A valid **refresh token** used to obtain an access token.

- ``PC_TOKEN_URL``: ``str = "https://core-proxy.sandbox.eosc-beyond.eu/auth/realms/core/protocol/openid-connect/token"``
  The URL to fetch the access token using the refresh token (following the OpenID Connect token flow).

- ``PC_CLIENT_ID``: ``str = "providers-api-token-client"``
  The client ID registered in the authentication server used to identify the requesting application.

If authentication is enabled (`PC_AUTH = True`), the application will request a bearer token using the provided credentials and attach it to the request header as:

.. code-block:: python

    headers["Authorization"] = f"Bearer {access_token}"

If the token request fails, a `requests.HTTPError` is raised.

Transformation General Settings
-------------
- ``INPUT_FORMAT``: ``str = "json"`` - Format of the input data files.
- ``OUTPUT_FORMAT``: ``str = "json"`` - Format of the output data files.

Running Service
===============

How to use the service? Upon successful launch of the service, the following components will be initiated:

- ``EOSC Transform Service``: by default, at http://0.0.0.0:8080 and http://0.0.0.0:8080/docs to access Swagger. It can be used to trigger actions.
- ``Flower Dashboard``: by default, at http://0.0.0.0:5555 to view current and past actions and monitor them.

Owner

  • Name: Cyfronet FID
  • Login: cyfronet-fid
  • Kind: organization

GitHub Events

Total
  • Create event: 22
  • Release event: 1
  • Issues event: 60
  • Delete event: 13
  • Issue comment event: 4
  • Push event: 45
  • Pull request review event: 16
  • Pull request review comment event: 14
  • Pull request event: 24
  • Fork event: 1
Last Year
  • Create event: 22
  • Release event: 1
  • Issues event: 60
  • Delete event: 13
  • Issue comment event: 4
  • Push event: 45
  • Pull request review event: 16
  • Pull request review comment event: 14
  • Pull request event: 24
  • Fork event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 44
  • Total pull requests: 12
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 8 days
  • Total issue authors: 4
  • Total pull request authors: 4
  • Average comments per issue: 0.16
  • Average comments per pull request: 0.0
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 30
  • Pull requests: 12
  • Average time to close issues: 14 days
  • Average time to close pull requests: 8 days
  • Issue authors: 4
  • Pull request authors: 4
  • Average comments per issue: 0.13
  • Average comments per pull request: 0.0
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • Michal-Kolomanski (36)
  • wiktorflorian (6)
  • agpul (1)
  • maria-j-k (1)
Pull Request Authors
  • Michal-Kolomanski (8)
  • maria-j-k (2)
  • github-actions[bot] (1)
  • wiktorflorian (1)
Top Labels
Issue Labels
enhancement (4) documentation (3) dump (2) Beyond (1) bug (1)
Pull Request Labels
bug (2) autorelease: pending (1) documentation (1)

Dependencies

.github/workflows/commitlint.yml actions
  • actions/checkout v3 composite
  • wagoid/commitlint-github-action v4 composite
.github/workflows/test.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/transform-styles.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
Dockerfile docker
  • python 3.10 build
docker-compose.dev.yml docker
  • itzg/rabbitmq-stomp latest
  • redis 7
docker-compose.yml docker
  • itzg/rabbitmq-stomp latest
  • redis 7
Pipfile pypi
  • black * develop
  • isort * develop
  • jupyter * develop
  • pylint * develop
  • pytest * develop
  • autodoc-pydantic *
  • boto3 *
  • celery *
  • coloredlogs *
  • fastapi *
  • flower *
  • pandas *
  • pyarrow *
  • pycountry *
  • pydantic *
  • pydantic-settings *
  • pyspark *
  • pytest-asyncio *
  • pytest-mock *
  • python-dotenv *
  • requests *
  • sentry-sdk *
  • sphinx *
  • sphinx-autodoc-typehints *
  • sphinx-rtd-theme *
  • stomp.py *
  • tqdm *
  • uvicorn *
Pipfile.lock pypi
  • 202 dependencies
docs/requirements.txt pypi
  • autodoc_pydantic *
  • sphinx ==7.4.7
  • sphinx_autodoc_typehints *
  • sphinx_rtd_theme *
pyproject.toml pypi