tool_vforwater_loader

Tool specification compliant tool to load data from the V-FOR-WaTer database

https://github.com/vforwater/tool_vforwater_loader

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 4 committers (25.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Tool specification compliant tool to load data from the V-FOR-WaTer database

Basic Info

Host: GitHub
Owner: VForWaTer
License: gpl-3.0
Language: Python
Default Branch: main
Size: 319 KB

Statistics

Stars: 0
Watchers: 3
Forks: 0
Open Issues: 1
Releases: 16

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

MetaCatalog Data Loader

This is a containerized Python tool to use MetaCatalog, a metadata and data source catalog to download data. The data is requested for a spatial and temporal extent, as specified in the metadata catalog, to be consistent across scales. An implementation for this tool is the V-FOR-WaTer platform.

This tool follows the Tool Specification for reusable research software using Docker.

Description

MetaCatalog stores metadata about internal and external datasets along with information about the data sources and how to access them. Using this tool, one can request datasets (called entries in MetaCatalog) by their id. Additionally, an area of interest is supplied as a GeoJSON feature, called reference area.

The database of the connected MetaCatalog instance is queried for the dataset_ids. The data-files are reuqested for the temporal extent of start_date and end_date if given, while the spatial extent is requested for the bounding box of reference_area. MetaCatalog entires without either of the scales defined are loaded entierly. Finally, the spatial extent is clipped by the reference_area to match exactly. Experimental parameters are not yet exposed, but involve:

- `netcdf_backend`, which can be either `'CDO'` or `'xarray'` (default) can switch the software used for the clip
of NetCDF data sources, which are commonly used for spatio-temporal datasets.

All processed data-files for each source are then saved to /out/datasets/, while multi-file sources are saved to child repositories. The file (or folder) names are built like: <variable_name>_<entry_id>.

Parameters

| Parameter | Description | | --- | --- | | datasetids | An array of integers referencing the IDs of the dataset entries in MetaCatalog. | | referencearea | A valid GeoJSON POLYGON Feature. Areal datasets will be clipped to this area. | | startdate | The start date of the dataset, if a time dimension applies to the dataset. | | enddate | The end date of the dataset, if a time dimension applies to the dataset. | | cell_touches | Specifies if an areal cell is part of the reference area if it only touches the geometry. |

Development and local run

New database

Either for development or local usage of this container, there is a docker-compose.yml file in there. It starts a PostgreSQL / PostGIS database, which persists its data into a local pg_data folder. The loader service will run the tool, with the local ./in and ./out mounted into the tool container and the database correctly connected. That means, you can adjust the examples in ./in and run the container using docker compose:

docker compose up -d

Obviously, on first run, there won't be a metacatalog initialized. There are examples at /examples/, which load different datasets into the same database instance, which can then be used. Alternatively, you can populate the database by hand. To create the necessary database structure, you can run the loader service but overwrite the default container command with a python console:

docker compose run --rm -it loader python

Then run:

python from metacatalog import api session = api.connect_database() api.create_tables(session) api.populate_defaults(session) exit()

Existing database

If you want to run the tool on an existing database, change the METACATALOG_URI in the docker-compose.yml. Remember, that this will still spin up the database service, thus, for production, you should either remove the database service from the docker-compose.yml, or use docker without docker compose, like:

docker build -t vfw_loader . docker run --rm -it -v /path/to/local/in:/in -v /path/to/local/out:out -v /path/to/local/datafiles:/path/to/local/datafiles -e METACATALOG_URI="postgresql..." vfw_loader

Structure

This container implements a common file structure inside container to load inputs and outputs of the tool. it shares this structures with the Python template, R template, NodeJS template and Octave template, but can be mimiced in any container.

Each container needs at least the following structure:

/ |- in/ | |- inputs.json |- out/ | |- ... |- src/ | |- tool.yml | |- run.py | |- CITATION.cff

inputs.json are parameters. Whichever framework runs the container, this is how parameters are passed.
tool.yml is the tool specification. It contains metadata about the scope of the tool, the number of endpoints (functions) and their parameters
run.py is the tool itself, or a Python script that handles the execution. It has to capture all outputs and either print them to console or create files in /out
CITATION.cff Citation file providing bibliographic information on how to cite this tool.

Does run.py take runtime args?:

Currently, the only way of parameterization is the inputs.json. A parameterization via arguments will likely be added in the future.

How to build the image?

You can build the image from within the root of this repo by docker build -t vfw_loader .

The images are also built by a GitHub Action on each release.

Owner

Name: V-FOR-WaTer
Login: VForWaTer
Kind: organization

Website: https://www.vforwater.de
Repositories: 42
Profile: https://github.com/VForWaTer

V-FOR-WaTer

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: V-FOR-WaTer Dataset Loader
message: >-
  Tool to load and harmonize heterogeneous datasets using metacatalog.
type: software
authors:
  - given-names: Mirko
    family-names: Mälicke
    email: mirko.maelicke@kit.edu
    affiliation: >-
      Institute for Water and Environment, Hydrology,
      Karlsruhe Institute for Technology (KIT)
    orcid: 'https://orcid.org/0000-0002-0424-2651'
  - given-names: Alexander
    family-names: Dolich
    email: alexander.dolich@kit.edu
    affiliation: >-
      Institute for Water and Environment, Hydrology,
      Karlsruhe Institute for Technology (KIT)
    orcid: 'https://orcid.org/0000-0003-4160-6765'
repository-code: 'https://github.com/VForWaTer/tool_vforwater_loader'
url: 'https://portal.vforwater.de'
abstract: >-
  This tool uses `metacatalog` to load datasets stored in a metacatalog instance, like V-FOR-WaTer.
  The requested datasources is made available in the output directory of the tool. Areal datasets
  (spatial scale defined) are clipped to the reference area and datasets with a temporal scale 
  defined in the metadata are clipped to the time range specified. 
keywords:
  - docker
  - tool-spec
  - V-For-WaTer
  - netCDF
  - clip
  - catchment
  - metacatalog
license: CC-BY-4.0
version: '0.13.0'
date-released: '2024-09-17'

GitHub Events

Total

Issues event: 2
Delete event: 3
Issue comment event: 2
Push event: 2
Pull request review event: 1
Pull request event: 2
Create event: 2

Last Year

Issues event: 2
Delete event: 3
Issue comment event: 2
Push event: 2
Pull request review event: 1
Pull request event: 2
Create event: 2

Committers

Last synced: about 1 year ago

All Time

Total Commits: 165
Total Committers: 4
Avg Commits per committer: 41.25
Development Distribution Score (DDS): 0.061

Past Year

Commits: 60
Committers: 2
Avg Commits per committer: 30.0
Development Distribution Score (DDS): 0.067

Top Committers

Name	Email	Commits
Mirko Mälicke	m**o@h**e	155
AlexDo1	a**h@k**u	6
jonaslenz	3****z	3
Jörg Meyer	2****t	1

Committer Domains (Top 20 + Academic)

kit.edu: 1 hydrocode.de: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 12
Total pull requests: 21
Average time to close issues: about 2 months
Average time to close pull requests: about 21 hours
Total issue authors: 4
Total pull request authors: 4
Average comments per issue: 3.08
Average comments per pull request: 0.52
Merged pull requests: 21
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 6
Pull requests: 17
Average time to close issues: 3 days
Average time to close pull requests: about 24 hours
Issue authors: 3
Pull request authors: 2
Average comments per issue: 1.0
Average comments per pull request: 0.59
Merged pull requests: 17
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mmaelicke (3)
elnazazmi (1)
Ash-Manoj (1)
MarcusStrobl (1)

Pull Request Authors

mmaelicke (14)
jonaslenz (3)
AlexDo1 (2)
joergmeyer-kit (2)

Top Labels

Issue Labels

Pull Request Labels

enhancement (2)

Dependencies

.github/workflows/docker-image.yml actions

actions/checkout v3 composite
docker/build-push-action v3 composite
docker/login-action v2 composite
docker/metadata-action v4 composite
docker/setup-buildx-action v2 composite
docker/setup-qemu-action v2 composite
softprops/action-gh-release v1 composite

Dockerfile docker

python 3.10.13 build

docker-compose.yml docker

postgis/postgis 15-3.4

examples/hyras/docker-compose.yml docker

postgis/postgis 15-3.4

tool_vforwater_loader

Science Score: 77.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

MetaCatalog Data Loader

Description

Parameters

Development and local run

New database

Existing database

Structure

How to build the image?

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies