voc4cat

A tool for creating and maintaining SKOS-vocabularies with Excel and GitHub.

https://github.com/nfdi4cat/voc4cat-tool

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    2 of 10 committers (20.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.7%) to scientific vocabulary

Keywords

nfdi nfdi4cat skos voc4cat xlsx

Keywords from Contributors

vocabulary taxonomy rdf catalysis shacl owl interactive standardization network-simulation hacking
Last synced: 6 months ago · JSON representation ·

Repository

A tool for creating and maintaining SKOS-vocabularies with Excel and GitHub.

Basic Info
  • Host: GitHub
  • Owner: nfdi4cat
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 4.44 MB
Statistics
  • Stars: 7
  • Watchers: 3
  • Forks: 1
  • Open Issues: 6
  • Releases: 32
Topics
nfdi nfdi4cat skos voc4cat xlsx
Created about 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License Citation Codeowners Zenodo

README.md

DOI PyPI - Version

SKOS vocabulary management with GitHub & Excel

Overview

For voc4cat, a term collection for catalysis created in NFDI4Cat, we developed a toolbox for collaboratively maintaining SKOS vocabularies on GitHub using Excel (xlsx-files) as user-friendly interface. It consists of several parts:

  • voc4cat-tool (this package)
    • A command-line tool to convert vocabularies from Excel to SKOS (turtle/rdf) and validate the vocabulary. Validation includes formal validation with SHACL-profiles but also additional checks. The voc4cat tool can be run locally but is also well suited for integration in CI-pipelines. It was inspired by RDFlib/VocExcel. Parts of the vocexcel code base were merged into this repository (see git history).
  • voc4cat-template
    • A github project template for managing SKOS-vocabularies using a GitHub-based workflows including automation by gh-actions.
  • voc4cat
    • A SKOS vocabulary for the catalysis disciplines that uses the voc4cat workflow for real work.

Command-line tool voc4cat

voc4cat was mainly developed to be used in gh-actions but it is also useful as a locally installed command line tool. It has the following features.

  • Convert between SKOS-vocabularies in Excel/xlsx format and rdf-format (turtle) in both directions.
  • Check/validate SKOS-vocabularies in rdf/turtle format with the vocpub SHACL-profile.
  • Use a vocabulary-configuration file to specify for example ID ranges for each contributor.
  • Check xlsx vocabulary files for errors or incorrect use of IDs (voc4cat uses pydantic for this validation)
  • Generate documentation from SKOS/turtle vocabulary file using pyLODE
  • Express concept-hierarchies in xlsx by indentation.
  • Consistently update all IRIs in the xlsx vocabulary (e.g. with new namespace or IDs)

voc4cat works on files or folders. If a folder is given all matching files are processed at once.

Installation requirements

To start you need:

  • Python (3.10 or newer)

voc4cat is platform independent and should work at least on windows, linux and mac.

Installation steps

If you just want to use the command line interface it is strongly suggested to use pipx for the installation. pipx makes installing and managing python command line application very easy.

pipx install voc4cat

Alternatively you can pip-install voc4cat like any other Python package. To install including all development tools use pip install .[dev] for just the test tools pip install .[tests]. For tests we use pytest.

Typical use

The available commands and options can be explored via the help system:

voc4cat --help (or simply voc4cat)

which lists all available sub commands. These have their own help, for example:

voc4cat transform --help

To create a new vocabulary use the voc4cat-adjusted template from the templates subfolder. For starting you can use simple temporary IRIs like (ex:my_term) for your concepts. With voc4cat you can later replace these later by different namespaces and/or different numeric IDs.

The files used below to demonstrate some commands can be found in the example folder of the repository.

For expressing hierarchies in SKOS ("broader"/"narrower") voc4cat offers two options. One way is to enter a list of children IRIs (in sheet "Concepts"). However, filling the Children URI columns with lists of IRIs can be tedious. Therefore, voc4cat offers a second easier way to express hierarchies between concepts and that is by indentation. voc4Cat understands Excel-indentation (the default) but can also work with other indentation formats (e.g. 3 spaces per level). To switch between the two representations, use the transform sub-command. For example, use

voc4cat transform --from-indent --outdir outbox example/photocatalysis_example_indented_prelim-IDs.xlsx

or if you were using 3 spaces per level (This file does not yet exist.)

voc4cat transform --from-indent --indent " " --inplace outbox outbox\photocatalysis_example_prelim-IDs.xlsx

to convert to ChildrenURI-hierarchy. To create such a file convert for example from ChildrenURI-hierarchy to indentation by

voc4cat transform --to-indent --indent " " --outdir outbox example/photocatalysis_example_prelim-IDs.xlsx

As mentioned above, you can replace all IDs belonging to a given prefix (here temp) by numeric IDs e.g. starting from 1001:

voc4cat transform --make-ids temp 1001 --outdir outbox example/photocatalysis_example_prelim-IDs.xlsx

This will consistently update all IRIs matching the temp:-prefix in the sheets "Concepts", "Additional Concept Features" and "Collections".

Finally, the vocabulary file can be converted from xlsx to SKOS/turtle format.

voc4cat convert example/photocatalysis_example.xlsx

A turtle file photocatalysis_example.ttl is created in the same directory where the xlsx-file is located.

The reverse is also possible. You can create an xlsx file from a turtle vocabulary file. Optionally a custom XLSX-template-file can be specified for this conversion:

voc4cat convert -O outbox --template templates/voc4cat_template_043.xlsx example/photocatalysis_example.ttl

In addition to transform and convert voc4cat offers checking and validation under the sub-command check and documentation generation under docs. See the command line help for details.

For maintainers a tool for similarity checks is provided which is based on sentence-transformer model to identify similar preferred labels and definitions. It also performs other consistency checks. The tool can either check a single vocabulary

voc-assistant check voc4cat.ttl

or compare the additions made against existing concepts:

`voc-assistant compare voc4cat.ttl voc4cat_new.ttl

It creates reports in markdown format.

Feedback and code contributions

We highly appreciate your feedback. Please create an issue on GitHub.

If you plan to contribute code, we suggest to also create an issue first to get early feedback on your ideas before you spend too much time.

By contributing you agree that your contributions fall under the project´s BSD-3-Clause license.

Acknowledgement

This work was funded by the German Research Foundation (DFG) through the project "NFDI4Cat - NFDI for Catalysis-Related Sciences" (DFG project no. 441926934), within the National Research Data Infrastructure (NFDI) programme of the Joint Science Conference (GWK).

This project uses the vocpub SHACL profile, which is licensed under the Creative Commons Attribution 4.0 International License (CC-BY 4.0). The original work was created by Nicholas J. Car. A copy of the license can be found at: https://creativecommons.org/licenses/by/4.0/. Changes were made to the original work for this project.

Owner

  • Name: NFDI4Cat
  • Login: nfdi4cat
  • Kind: organization
  • Location: Germany

Working on a research data infrastructure for the catalysis disciplines.

Citation (CITATION.bib)

@misc{@software{david_linke_8277925,
  author       = {David Linke and
                  Peter Philips and
                  Nicholas Car and
                  Jamie Feiss},
  title        = {nfdi4cat/voc4cat-tool - A command-line tool written in Python for
                  creating and maintaining SKOS-vocabularies with Excel and GitHub.},
  abstractNote = {The voc4cat tool can be run locally but is also well suited for
                  integration in CI-pipelines. It is for example used to maintain
                  the SKOS vocabulary for catalysis "voc4cat" on GitHub
                  (nfdi4cat/voc4cat). Supported features include conversion between
                  SKOS/turtle and Excel/xlsx, validation, transformations between
                  different representations and documentation generation.},
  year         = {2022--2025},
  publisher    = {Zenodo},
  version      = {v0.8.7},
  doi          = {10.5281/zenodo.8277925},
  url          = {https://doi.org/10.5281/zenodo.8277925}
}

GitHub Events

Total
  • Create event: 51
  • Release event: 8
  • Issues event: 35
  • Watch event: 1
  • Delete event: 47
  • Issue comment event: 44
  • Push event: 83
  • Pull request event: 85
Last Year
  • Create event: 51
  • Release event: 8
  • Issues event: 35
  • Watch event: 1
  • Delete event: 47
  • Issue comment event: 44
  • Push event: 83
  • Pull request event: 85

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 412
  • Total Committers: 10
  • Avg Commits per committer: 41.2
  • Development Distribution Score (DDS): 0.279
Past Year
  • Commits: 151
  • Committers: 5
  • Avg Commits per committer: 30.2
  • Development Distribution Score (DDS): 0.093
Top Committers
Name Email Commits
David Linke d****e@c****e 297
peterphilips p****s@s****m 53
Nicholas Car n****r@s****m 32
David Linke d****o 9
jamiefeiss j****s@g****m 6
David Linke d****e@g****m 5
dependabot[bot] 4****] 5
CI.runner C****r@g****e 3
David Linke D****e@c****e 1
mark doerr m****r@u****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 68
  • Total pull requests: 172
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 3 days
  • Total issue authors: 4
  • Total pull request authors: 3
  • Average comments per issue: 1.12
  • Average comments per pull request: 0.07
  • Merged pull requests: 159
  • Bot issues: 0
  • Bot pull requests: 47
Past Year
  • Issues: 23
  • Pull requests: 82
  • Average time to close issues: 13 days
  • Average time to close pull requests: 1 day
  • Issue authors: 3
  • Pull request authors: 2
  • Average comments per issue: 1.39
  • Average comments per pull request: 0.04
  • Merged pull requests: 76
  • Bot issues: 0
  • Bot pull requests: 31
Top Authors
Issue Authors
  • dalito (62)
  • markdoerr (4)
  • nichtich (1)
  • oggioniale (1)
Pull Request Authors
  • dalito (124)
  • dependabot[bot] (57)
  • markdoerr (1)
Top Labels
Issue Labels
enhancement (24) bug (21) housekeeping (12) breaking (4) discussion (4) documentation (3) question (1) good first issue (1) invalid/wontfix (1) dependencies (1)
Pull Request Labels
dependencies (65) housekeeping (25) enhancement (24) bug (11) github_actions (11) breaking (10) documentation (4) blocked (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 92 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 22
  • Total maintainers: 1
pypi.org: voc4cat

SKOS vocabulary management tool of NFDI4Cat.

  • Versions: 22
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 92 Last month
Rankings
Dependent packages count: 7.5%
Forks count: 30.2%
Average: 36.7%
Stargazers count: 39.2%
Dependent repos count: 69.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
.github/workflows/pypi-publish.yml actions
  • actions/checkout v3.1.0 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • pypa/gh-action-pypi-publish b7f401de30cb6434a1e19f805ff006643653240e composite
pyproject.toml pypi
  • base32-crockford >= 0.3.0
  • curies >= 0.6.6
  • networkx >= 2.8
  • ontospy >= 2.1.0
  • openpyxl >= 3.0.9
  • pillow >= 9.1.0
  • pyLODE < 3.0.0
  • pydantic < 2.0.0
  • pyshacl >= 0.18.1
  • rdflib >= 6.1.1
  • tomli >=1.1.0; python_version < '3.11'