runcrate-analysis
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: RenskeW
- Language: Common Workflow Language
- Default Branch: main
- Size: 2.43 MB
Statistics
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 7
- Releases: 0
Metadata Files
README.md
Analysis of runcrate 0.5.0
This repository documents the analysis of Workflow Run RO-Crates (WRROC) converted from CWLProv RO Bundles using runcrate. The results of this analysis are also published on Zenodo: https://doi.org/10.5281/zenodo.12689424.
The analysis follows the same methodology as previous work, in which we conducted a qualitative evaluation of metadata coverage in CWLProv (version 0.6.0). This earlier analysis was based on concrete examples of ROs associated with a realistic bioinformatics workflow. Here, we repeated the analysis for Workflow Run RO-Crate, and compared the WRROC RDF representation (in ro-crate-metadata.json) with the CWLProv RDF provenance graph.
Methods
We used the following approach and documented it in the Issues:
- Provenance metadata was classified into 6 categories: T1-6.
- For each category, we made an inventory of metadata that is contained in CWLProv RO Bundles (in RDF, and structured, non-RDF documents (
packed.cwl, andprimary-job.json/primary-output.json)). - Subsequently, we assessed if and how this information is represented in Workflow Run RO-Crates converted by runcrate, based on a number of examples (see below).
- Finally, we provided suggestions how to represent metadata that is present in CWLProv but missing in RO-Crate.
Scenario 1: Analyze representation of CWL metadata fields, human agent, file characteristics, execution details
Scenario 2: Analyze representation of SoftwareRequirement
Scenario 3: Analyze representation of DockerRequirement
Scenario 4: Analyze representation of String, File, Directory and File array input parameters AND ResourceRequirement
Results
Overview of the representation of each category of the provenance taxonomy, and its representation in RO-Crate. For a detailed explanation of each of the categories, see here: https://doi.org/10.5281/zenodo.7014950.
SC1: Workflow design
Explanation of the design of the workflow and its steps can be included in the CWL metadata fields (doc, label, intent).
- CWL (non-RDF): full representation
ro-crate-metadata.json(RDF): full representation- CWLProv RDF: no representation
SC2: Entity annotations
Explanation of the meaning of individual input/output data entities can be represented as structured annotations in the CWL input parameter file (not propagated to ro-crate-metadata.json), but there is in the CWL standards v1.2 no clear guideline how to do these annotations.
- CWL (non-RDF): partial representation (structured annotations but no CWL metadata fields)
ro-crate-metadata.json(RDF): no representation- CWLProv RDF: no representation
SC3: Workflow execution annotations
Workflow execution annotations (why was this combination of input parameters chosen?) can be represented as annotations in the CWL input parameter file (unstructured, not propagated to ro-crate-metadata.json).
- CWL (non-RDF): unstructured representation
ro-crate-metadata.json(RDF): no representation- CWLProv RDF: no representation
D1: Data identification
This information can be added in the CWL input parameter file as structured annotations, but there is in the CWL standards v1.2 no clear guideline how to do these annotations. I
- CWL (non-RDF): partial representation
ro-crate-metadata.json(RDF): no representation- CWLProv RDF: no representation
D2: File characteristics
Filename, checksum are represented for all files, creation timestamps are available for output files. Additional structured annotations may be made in the CWL input parameter file. Filename and checksum are propagated to ro-crate-metadata.json.
- CWL (non-RDF): partial representation
ro-crate-metadata.json(RDF): partial representation- CWLProv RDF: partial representation
D3: Data access
The CWL standards v1.2 allow specification of a remote location for data, which would serve as access to a downloadable form of the data.
- CWL (non-RDF): partial representation
ro-crate-metadata.json(RDF): no representation- CWLProv RDF: no representation
D4: Parameter mapping
Mapping of input/output data to workflow parameters is represented in ro-crate-metadata.json.
- CWL (non-RDF): full representation
ro-crate-metadata.json(RDF): full representation- CWLProv RDF: full representation
SW1: Software identification
SoftwareRequirement field is propagated to ro-crate-metadata.json. SoftwareRequirement contains specs field with IRI, resolving to landing page with metadata about the tool (see CWL standards v1.2).
- CWL (non-RDF): full representation
ro-crate-metadata.json(RDF): full representation- CWLProv RDF: no representation
SW2: Software documentation
SoftwareRequirement field is propagated to ro-crate-metadata.json.
- CWL (non-RDF): full representation
ro-crate-metadata.json(RDF): full representation- CWLProv RDF: no representation
SW3: Software access
SoftwareRequirement field is propagated to ro-crate-metadata.json.
- CWL (non-RDF): full representation
ro-crate-metadata.json(RDF): full representation- CWLProv RDF: no representation
WF1: Workflow software
The workflow itself (packed.cwl) is contained in the CWLProv RO Bundle, as well as the RO-Crate produced by runcrate. Metadata/documentation about the workflow can be represented in CWL metadata fields (doc, label, intent), which are propagated to ro-crate-metadata.json. ro-crate-metadata.json also contains a description of the workflow and all its parameters and steps. The representation of the workflow in CWLProv RDF is incomplete.
- CWL (non-RDF): full representation
ro-crate-metadata.json(RDF): full representation- CWLProv RDF: partial representation
WF2: Workflow parameters
Information about the workflow parameters can be represented in the CWL metadata fields (doc, label, format).
- CWL (non-RDF): full representation
ro-crate-metadata.json(RDF): full representation- CWLProv RDF: no representation
WF3: Workflow requirements
The CWL ResourceRequirement field is partially propagated to ro-crate-metadata.json (Scenario 4).
- CWL (non-RDF): full representation
ro-crate-metadata.json(RDF): partial representation- CWLProv RDF: no representation
ENV1: Software environment
Absent.
ENV2: Hardware environment
Absent.
ENV3: Container image
Container image is partially represented in CWL DockerRequirement field, which is propagated to ro-crate-metadata.json (Scenario 3).
- CWL (non-RDF): partial representation
ro-crate-metadata.json(RDF): partial representation- CWLProv RDF: no representation
EX1: Execution timestamps
- CWL (non-RDF): no representation
ro-crate-metadata.json(RDF): full representation- CWLProv RDF: full representation
EX2: Consumed resources
Absent.
EX3: Workflow engine
- CWL (non-RDF): no representation
ro-crate-metadata.json(RDF): partial representation- CWLProv RDF: partial representation (only name and version, no IRI resolving to landing page with metadata)
EX4: Human agent
- CWL (non-RDF): no representation
ro-crate-metadata.json(RDF): full representation- CWLProv RDF: full representation
Owner
- Login: RenskeW
- Kind: user
- Repositories: 5
- Profile: https://github.com/RenskeW
Citation (CITATION.cff)
cff-version: 1.1.0
message: "Cite as"
authors:
- family-names: de Wit
given-names: Renske
orcid: https://orcid.org/0000-0003-0902-0086
- family-names: Crusoe
given-names: Michael R.
orcid: https://orcid.org/0000-0002-2961-9670
title: "runcrate-analysis"
doi: 10.5281/zenodo.12689424
date-released: 2024-07-08