tool-discovery
The CLARIAH tool discovery repository holds the Tool Source registry, configurations for the software metadata harvesting pipeline.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
2 of 17 committers (11.8%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Keywords from Contributors
Repository
The CLARIAH tool discovery repository holds the Tool Source registry, configurations for the software metadata harvesting pipeline.
Basic Info
- Host: GitHub
- Owner: CLARIAH
- Language: TeX
- Default Branch: master
- Homepage: https://tools.clariah.nl
- Size: 11.8 MB
Statistics
- Stars: 12
- Watchers: 6
- Forks: 8
- Open Issues: 2
- Releases: 12
Metadata Files
README.md
CLARIAH Tool Discovery
This repository contains everything related to tool discovery and software metadata in CLARIAH.
* Dockerfile: The docker container for the CLARIAH Tool Discovery pipeline, including both the
harvester and the server and
API powering the CLARIAH Tool Store.
* source-registry/: The tool source registry, contains the source repositories locations and service endpoints for all
CLARIAH tools. This is open for contributions
* etc/, static/: supporting files for the deployment at
* legacy/cmdi/: Contains legacy CMDI metadata as gathered in WP3 task MD4T at Utrecht University
Service
The tool discovery service, consisting of a harvester that runs on regular intervals (each night) and a tool store, is deployed at https://tools.clariah.nl (production, may not be available yet at this time!) and https://tools.dev.clariah.nl (development).
All harvested data is also available as individual files via https://tools.dev.clariah.nl/files/
Links
- Tool Discovery kanban board - Project planning
- CLARIAH Tool Discovery Presentation - Presented at CLARIAH Tech Day
Usage
For CLARIAH (local development):
docker build -t clariah-tool-discovery .
docker run -itd -p 8080:80 --env-file=local-dev.env --name=cm-srv -v codemeta_volume:/tool-store-data --restart=unless-stopped clariah-tool-discovery
We recommend you to also pass an extra --env GITHUB_TOKEN=.......... or you will likely hit GitHub's API rate limit during harvestinh. Similarly you can pass a ZENODO_ACCESS_TOKEN
More generic:
docker build -t codemeta-server-tool --build-arg nginx_pass=some_password .
docker run -itd -p 80:80 --env-file=my-env.env --name=cm-srv -v codemeta_volume:/tool-store-data --restart=unless-stopped codemeta-server-tool
To use local yamls for sources harvesting (rather than a remote git repo); add to run -v $PWD/source-registry/:/usr/src/source-registry/source-registry/ and set LOCAL_SOURCE_REGISTRY=true in my-env.env.
Event-based collection, i.e. allowing clients to pushing codemeta files, can be enabled by setting --env-arg UPLOADER=true, you can then POST your codemeta.json file with curl -u <nginx-user> -XPOST -H "Content-Type: application/json" -dcodemeta.json -u user <url>/rest/
For private git repo add to docker run -e GIT_USER='youruser' -e GIT_PASSWORD='yourtoken'
To clean up remove the volume codemeta_volume
Integration: API usage instructions
If you want to query the Tool Store from other software, please read this document for instructions on how to use our SPARQL endpoint.
Owner
- Name: CLARIAH
- Login: CLARIAH
- Kind: organization
- Website: http://www.clariah.nl
- Repositories: 65
- Profile: https://github.com/CLARIAH
CLARIAH offers humanities scholars a Common Lab providing access to large collections of digital resources and innovative tools for research
CodeMeta (codemeta.json)
{
"@context": [
"https://w3id.org/codemeta/3.0",
"http://schema.org",
"https://w3id.org/software-types",
"https://w3id.org/software-iodata"
],
"author": {
"@id": "https://orcid.org/0000-0002-1046-0006",
"affiliation": {
"@id": "https://huc.knaw.nl"
},
"familyName": "van Gompel",
"givenName": "Maarten",
"url": "https://proycon.anaproy.nl",
"@type": "Person"
},
"@id": "https://github.com/CLARIAH/tool-discovery.git",
"@type": "SoftwareSourceCode",
"name": "CLARIAH Tool Discovery",
"version": "2.0.0",
"dateCreated": "2022-01-05T16:21:48Z",
"dateModified": "2025-03-07T12:45:44Z",
"description": "This is the over-arching project for CLARIAH Tool Discovery, its components harvest and aggregate codemeta from source repositories and service endpoints, automatically converting known metadata schemes in the process. This project holds the Tool Source Registry, pointing to all the tools that are to be harvested. It also holds the validation schema.",
"developmentStatus": [
"https://www.repostatus.org/#active",
"https://w3id.org/research-technology-readiness-levels#Level8Complete"
],
"codeRepository": "https://github.com/CLARIAH/tool-discovery.git",
"contIntegration": "https://github.com/CLARIAH/tool-discovery/actions/",
"license": "https://spdx.org/licenses/GPL-3.0-only",
"identifier": "tool-discovery",
"issueTracker": "https://github.com/CLARIAH/tool-discovery/issues",
"readme": "https://github.com/proycon/CLARIAH/tool-discovery/blob/master/README.md",
"softwareHelp": [
{
"@type": "HowTo",
"name": "Contributor Guidelines",
"description": "This explains how to add your own software to be harvested by us and answers various Frequently Asked Questions.",
"url": "https://github.com/CLARIAH/tool-discovery/blob/master/CONTRIBUTING.md"
},
{
"@type": "TechArticle",
"name": "Software Metadata Requirements",
"description": "This document specifies the technical and organisational requirements for your software metadata, including the precise form in which it can be supplied.",
"url": "https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md"
},
{
"@type": "TechArticle",
"name": "Querying the CLARIAH Tool Store for integration in other software",
"description": "This document is intended for other developers and explains that want to make use of the information that has been harvested and aggregated in the tool discovery pipeline. It explains the API available in our backend.",
"url": "https://github.com/CLARIAH/tool-discovery/blob/master/API_USAGE.md"
}
],
"applicationCategory": [
"https://vocabs.dariah.eu/tadirah/exploration",
"https://vocabs.dariah.eu/tadirah/browsing",
"https://vocabs.dariah.eu/tadirah/discovering",
"https://vocabs.dariah.eu/tadirah/gathering",
"https://w3id.org/nwo-research-fields#SoftwareForHumanities",
"https://w3id.org/nwo-research-fields#DatabasesForHumanities"
],
"keywords": [
"metadata",
"codemeta",
"rdf",
"software metadata",
"schema.org",
"linked data",
"harvester"
],
"maintainer": {
"@id": "https://orcid.org/0000-0002-1046-0006",
"affiliation": {
"@id": "https://huc.knaw.nl"
},
"familyName": "van Gompel",
"givenName": "Maarten",
"url": "https://proycon.anaproy.nl",
"@type": "Person"
},
"producer": {
"@id": "https://huc.knaw.nl",
"@type": "Organization",
"name": "KNAW Humanities Cluster",
"url": "https://huc.knaw.nl",
"parentOrganization": {
"@id": "https://knaw.nl",
"@type": "Organization",
"name": "KNAW",
"url": "https://knaw.nl",
"location": {
"@type": "Place",
"name": "Amsterdam"
}
}
},
"programmingLanguage": "shell",
"screenshot": [
"https://raw.githubusercontent.com/proycon/codemeta-server/master/screenshot_index_cards.jpg",
"https://raw.githubusercontent.com/proycon/codemeta-server/master/screenshot_index_table.jpg",
"https://raw.githubusercontent.com/proycon/codemeta-server/master/screenshot_page.jpg",
"https://raw.githubusercontent.com/proycon/codemeta-server/master/screenshot_sparql.jpg"
],
"softwareRequirements": [
{
"@id": "/dependency/codemetapy",
"@type": "SoftwareApplication",
"identifier": "codemetapy",
"name": "codemetapy",
"runtimePlatform": "Python 3"
},
{
"@id": "/dependency/codemeta-harvester",
"@type": "SoftwareApplication",
"identifier": "codemeta-harvester",
"name": "codemeta-harvester"
},
{
"@id": "/dependency/codemeta-server",
"@type": "WebApplication",
"identifier": "codemeta-server",
"name": "codemeta-server"
}
],
"isSourceCodeOf": {
"@type": "WebApplication",
"name": "CLARIAH Tools",
"url": "https://tools.clariah.nl",
"description": "This is a web portal where you can find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. This list is automatically harvested from the tool producers and providers themselves, and updated daily. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you.",
"provider": "https://huc.knaw.nl"
},
"funding": {
"@type": "Grant",
"name": "CLARIAH-PLUS (NWO grant 184.034.023)",
"funder": {
"@type": "Organization",
"name": "NWO",
"url": "https://www.nwo.nl"
}
}
}
GitHub Events
Total
- Issues event: 1
- Watch event: 3
- Issue comment event: 15
- Push event: 33
- Pull request event: 22
- Fork event: 6
- Create event: 1
Last Year
- Issues event: 1
- Watch event: 3
- Issue comment event: 15
- Push event: 33
- Pull request event: 22
- Fork event: 6
- Create event: 1
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Maarten van Gompel | p****n@a****l | 251 |
| Jaap Blom | j****m@b****l | 8 |
| mlongobardo-gituname | u****n | 6 |
| Dirk Roorda | t****n@i****m | 5 |
| David de Boer | d****d@d****l | 4 |
| Peter Kleiweg | p****g@r****l | 3 |
| Bram Buitendijk | b****k@d****l | 3 |
| Kathrin Dentler | k****r@t****c | 2 |
| Odijk | j****k@u****l | 2 |
| Menzo Windhouwer | m****r@d****l | 2 |
| kerim1 | k****r@d****l | 1 |
| jessededoes | d****s@x****l | 1 |
| Xander Wilcke | w****e@v****l | 1 |
| Rik D.T. Janssen | 1****n | 1 |
| Richard Zijdeman | r****n@i****l | 1 |
| Hayco de Jong | h****g@d****l | 1 |
| mwigham | 3****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 10
- Total pull requests: 52
- Average time to close issues: 19 days
- Average time to close pull requests: 7 days
- Total issue authors: 6
- Total pull request authors: 20
- Average comments per issue: 1.9
- Average comments per pull request: 1.1
- Merged pull requests: 51
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 23
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Issue authors: 1
- Pull request authors: 5
- Average comments per issue: 0.0
- Average comments per pull request: 0.61
- Merged pull requests: 23
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- proycon (4)
- ddeboer (2)
- pebbe (1)
- firmao (1)
- kequach (1)
- rlzijdeman (1)
Pull Request Authors
- firmao (16)
- jblom (7)
- brambg (3)
- pebbe (2)
- JanOdijk (2)
- xmichele (2)
- jan-niestadt (2)
- dirkroorda (2)
- mwigham (2)
- menzowindhouwer (2)
- BeritJanssen (2)
- ddeboer (2)
- jrvosse (1)
- JessedeDoes (1)
- rlzijdeman (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- pyshacl *
- actions/checkout v3 composite
- actions/setup-python v3 composite
- proycon/codemeta-harvester $BASETAG build