Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Repository
Abstract Workflow Language Schema
Basic Info
- Host: GitHub
- Owner: OO-LD
- Default Branch: main
- Size: 117 KB
Statistics
- Stars: 4
- Watchers: 0
- Forks: 0
- Open Issues: 2
- Releases: 2
Metadata Files
README.md
Early Draft of an application of OO-LD on Abstract Syntax Trees in order share workflow descriptions and map them bidirectional to RDF.
Background
Workflows play an essential part in structuring the processing of both physical objects and data. Despite the relevance of detailed information about the processing history of a resulting asset for provenance, compareability and reproduceability a detailed and machine readable description of the workflow is usually not part of our scientific work and publication. AWL and AWL-LD address this issue by creating tools and definitions to document both experimental and computational workflows. The aim is to provide common notations for workflows in order to generate a semantic description that allows further analytics and transformations based on OO-LD. Based on the AST representation of any program code (e.g. python) or existing declarative representations it is framework agnostic and can interlink generic (e.g. prefect, argo) and scientific workflow frameworks (e.g. aiida, pyiron, perqueue).
Abstract Workflow Language Schema
For illustration we use the AST generated by the python std-lib ast module, see also implementation in awl-python. However, the concept should work with any AST (examples see https://astexplorer.net/) although it is recommeded to stick to core language patterns with broad language support.
Objectives
- Language-agnostic JSON serialization
- Support for any pattern that is supported by programming languages (Conditions, Loops, Function calls)
- Option to restrict supported features via JSON-SCHEMA (e.g. don't allow class declarations, allow only a whitelist of function calls, restrict function call parameters)
- Option to map to RDF via JSON-LD context in order to allow complex analytics and queries via SPARQL or SHACL (e.g. find all workflows that make use of certain functions in a specific order)
Use Cases
Conceptual Assessment
- What is the input and output of a node?
- Validate a planned workflow => Is it executeable, Is it scientifically sound? see example
- Validate executed workflow => Did a state / constellation / path occur that is not valid (probably due to knowledge that was not present while planning)?
- Path finding: How can nodes be connected to get from input type A to output of type B? see example
Code Execution Provenance through code / system inspection
- What were the parameters that the workflow my_workflow was ever executed with?
- What was y in functiontwo when myworkflow was executed with parameters (1,2)
- How long did the execution of the workflow my_workflow (1,2) took, what were the most expensive execution steps?
- What was the architecture of the node this most expensive computation step was execute on (e.g. number of cores, memory)
- Which package versions were loaded? Was scipy scipy version < 1.15 used?
Execution Environment Provenance (out of scope for AWL, needs workflow environment / versionized file store)
- Which workflows used in their execution a docker container "xyz" for a specific node?
- What was the version number/conda enviroment/gitcommit for the node functiontwo in the workflow execution of my_workflow that resulted in output=1?
- what was the content of that file when my_workflow was executed with parameters (1,2)
Playground
A playground to work with AWL can be found here: AWL Playground (Work in progress)
Schema
Python-AST
Note: The context makes use of type-scoped contexts, see https://www.w3.org/TR/json-ld11/#scoped-contexts
Note: The schema section is empty allowing any AST. However, workflow domains should define a restricted set of e.g. function calls that represent available and safe options.
yaml
'@context':
- awl: https://oo-ld.github.io/awl-schema/
ex: https://example.org/
'@base': https://oo-ld.github.io/awl-schema/
_type: '@type'
id: '@id'
body: awl:HasPart
Name:
'@id': awl:Variable
'@context':
'@base': https://example.org/
targets: awl:HasTarget
value:
'@id': awl:HasValue
'@context':
value: '@value'
If:
'@id': awl:If
'@context':
body: awl:IfTrue
orelse: awl:IfFalse
test: awl:HasCondition
comparators: awl:HasRightHandComparator
ops: awl:HasOperator
left: awl:HasLeftHandComparator
func:
'@id': awl:HasFunctionCall
'@context':
'@base': https://example.org/
Name: awl:Function
value: awl:HasValue
args: awl:HasArgument
keywords:
'@id': awl:HasKeywordArgument
'@context':
value: awl:HasValue
arg: awl:HasKey
title: AWL
type: object
Blocks
If-Else-Block
python
if a == 1:
b = 1
else:
b = 'test'
Python Code
yaml
_type: Module
body:
- _type: If
body:
- _type: Assign
targets:
- _type: Name
id: b
value:
_type: Constant
value: 1
orelse:
- _type: Assign
targets:
- _type: Name
id: b
value:
_type: Constant
value: test
test:
_type: Compare
comparators:
- _type: Constant
value: 1
left:
_type: Name
id: a
ops:
- _type: Eq
type_ignores: []
AST
turtle
<https://example.org/run> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Function> .
<https://example.org/x> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Variable> .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Module> .
_:b0 <https://oo-ld.github.io/awl-schema/HasPart> _:b1 .
_:b0 <https://oo-ld.github.io/awl-schema/HasPart> _:b3 .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Assign> .
_:b1 <https://oo-ld.github.io/awl-schema/HasTarget> <https://example.org/x> .
_:b1 <https://oo-ld.github.io/awl-schema/HasValue> _:b2 .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Call> .
_:b2 <https://oo-ld.github.io/awl-schema/HasFunctionCall> <https://example.org/run> .
_:b3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Expr> .
_:b3 <https://oo-ld.github.io/awl-schema/HasValue> _:b4 .
_:b4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Call> .
_:b4 <https://oo-ld.github.io/awl-schema/HasArgument> <https://example.org/x> .
_:b4 <https://oo-ld.github.io/awl-schema/HasFunctionCall> <https://example.org/run> .
RDF
Funtion call
python
x = run(a=1))
run(x)
Python Code
yaml
_type: Module
body:
- _type: Assign
body: []
value:
_type: Call
args: []
func:
_type: Name
id: run
keywords:
- _type: keyword
arg: a
value:
_type: Constant
value: 1
targets:
- _type: Name
id: x
- _type: Expr
body: []
value:
_type: Call
args:
- _type: Name
id: x
func:
_type: Name
id: run
keywords: []
type_ignores: []
AST
turtle
<https://example.org/run> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Function> .
<https://example.org/x> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Variable> .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Module> .
_:b0 <https://oo-ld.github.io/awl-schema/HasPart> _:b1 .
_:b0 <https://oo-ld.github.io/awl-schema/HasPart> _:b3 .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Assign> .
_:b1 <https://oo-ld.github.io/awl-schema/HasTarget> <https://example.org/x> .
_:b1 <https://oo-ld.github.io/awl-schema/HasValue> _:b2 .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Call> .
_:b2 <https://oo-ld.github.io/awl-schema/HasFunctionCall> <https://example.org/run> .
_:b3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Expr> .
_:b3 <https://oo-ld.github.io/awl-schema/HasValue> _:b4 .
_:b4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/Call> .
_:b4 <https://oo-ld.github.io/awl-schema/HasArgument> <https://example.org/x> .
_:b4 <https://oo-ld.github.io/awl-schema/HasFunctionCall> <https://example.org/run> .
RDF
Code Execution
Beside the static code analysis with AST we can also trace the actual code execution e.g. with the python tracing module. This provides us with a detailed log about the called functions, taken paths in control structures (If-Else, While), state of internal variables and timing information.
Example (see also playground):
```py def function_one(x): u = 'https://play.min.io/bucket/example.csv?versioinId=fe45d98a' return x
def function_two(y): return y
def function_three(c, d): return c + d
def myworkflow(a, b, d=0): if a > 0: c = functionone(a) else: c = functiontwo(b) while d <= 0: d = functionthree(c, d) return d my_workflow(1, 2) ```
Workflow Code
yaml
- func: my_workflow
lineno: 11
locals:
- name: a
value: 1
- name: b
value: 2
- name: d
value: 0
os:
cores: 8
timestamp: 1742034345.611
type: call
- func: function_one
lineno: 1
locals:
- name: x
value: 1
os:
cores: 8
timestamp: 1742034345.613
type: call
- func: function_one
lineno: 3
locals:
- name: u
value: https://play.min.io/bucket/example.csv?versioinId=fe45d98a
- name: x
value: 1
os:
cores: 8
timestamp: 1742034345.614
type: return
- ...
Tracing output
Transforming the result with JSON-LD leads, e.g. to a list of all visited functions (see playground)
json
{
"@context": {
"awl": "https://awl.org/",
"ex": "https://example.org/",
"@base": "https://awl.org/",
"steps": {
"@id": "awl:HasStep",
"@container": "@list"
},
"func": {
"@id": "awl:calls",
"@type": "@id",
"@context": {
"@base": "ex"
}
},
"type": {
"@id": "@type"
},
"call": {
"@id": "awl:Call"
},
"locals": {
"@id": "awl:Parameters",
"@type": "@id",
"@context": {
"@base": "https://example.org/",
"name": "awl:HasName",
"value": "awl:HasValue"
}
}
},
"type": "awl:WorkflowRun",
"steps": [
{
"type": "call",
"locals": [
{
"name": "a",
"value": 1
},
{
"name": "b",
"value": 2
},
{
"name": "d",
"value": 0
}
],
"func": "ex:my_workflow"
},
{
"type": "call",
"locals": {
"name": "x",
"value": 1
},
"func": "ex:function_one"
},
{
"type": "call",
"locals": [
{
"name": "c",
"value": 1
},
{
"name": "d",
"value": 0
}
],
"func": "ex:function_three"
}
]
}
JSON-LD
turtle
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://awl.org/WorkflowRun> .
_:b0 <https://awl.org/HasStep> _:b4 .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://awl.org/Call> .
_:b1 <https://awl.org/calls> <https://ex.org/my_workflow> .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://awl.org/Call> .
_:b2 <https://awl.org/calls> <https://ex.org/function_one> .
_:b3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://awl.org/Call> .
_:b3 <https://awl.org/calls> <https://ex.org/function_three> .
_:b4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b1 .
_:b4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b5 .
_:b5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b2 .
_:b5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b6 .
_:b6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b3 .
_:b6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
RDF (only function call order)
The union graph of the abstract workflow definition (T-Box) and the individual workflow execution (A-Box) now allows to track all relevant information. For visualization purposes we frame the graph with the workflow execution as root (see playground).
Rendering the resulting JSON-LD document as graph provides us the full picture (see playground / Tab: Visualized)
Visualization of the union graph (individual workflow execution / A-Box on the left, abstract workflow definition / T-Box on the right) that shows the execution of ex:myworkflow and its subnodes ex:functionone and ex:functionthree, skipping ex:functiontwo
Examples
While it wouldn't make much sense to generate AWL for a complex python program it can be the right tool to define the high-leven function calling sequence on top of base libs.
Planning of Workflows
Using class as type annotations of functions / workflow nodes can provide us an input-output perpective on.
```py from pydantic import BaseModel
class RawData(BaseModel): modelconfig = ConfigDict( jsonschema_extra={ "@context": { "ex": "https://example.org/", ... }, "iri": "ex:RawData", # the IRI of the class } ) pass
class Data(BaseModel): modelconfig = ConfigDict( jsonschema_extra={ "@context": { ... }, "iri": "ex:Data", # the IRI of the class } ) pass
class Plot(BaseModel): modelconfig = ConfigDict( jsonschema_extra={ "@context": { ... }, "iri": "ex:Plot", # the IRI of the class } ) pass
def analyse(input: RawData) -> Data: ... return Data(...)
def visualize(input: Data) -> Plot: ... return Plot(...) ```
Note: Classes should be globally identifiable by their import path and/or an IRI annotation.
Converting this code to a graph will provide us an inventory of nodes (see playground
mermaid
flowchart TD
f1(ex:analyze) -->|awl:hasInput| c1[ex:RawData]
f1 -->|awl:hasOutput| c2[ex:Data]
f2(ex:plot) -->|awl:hasInput| c2
f2 -->|awl:hasOutput| c3[ex:Plot]
json
{
"@context": {
"awl": "https://oo-ld.github.io/awl-schema/",
"ex": "https://example.org/",
"type": "@type",
"id": "@id",
"input": {
"@id": "awl:hasInput",
"@type": "@id"
},
"output": {
"@id": "awl:hasOutput",
"@type": "@id"
}
},
"@graph": [
{
"id": "ex:analyse",
"type": "awl:FunctionDef",
"input": "ex:RawData",
"output": "ex:Data"
},
{
"id": "ex:visualize",
"type": "awl:FunctionDef",
"input": "ex:Data",
"output": "ex:Plot"
}
]
}
turtle
<https://example.org/analyse> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/FunctionDef> .
<https://example.org/analyse> <https://oo-ld.github.io/awl-schema/hasInput> <https://example.org/RawData> .
<https://example.org/analyse> <https://oo-ld.github.io/awl-schema/hasOutput> <https://example.org/Data> .
<https://example.org/visualize> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://oo-ld.github.io/awl-schema/FunctionDef> .
<https://example.org/visualize> <https://oo-ld.github.io/awl-schema/hasInput> <https://example.org/Data> .
<https://example.org/visualize> <https://oo-ld.github.io/awl-schema/hasOutput> <https://example.org/Plot> .
Validation of Workflows
Example ```py class Pullover: color: Literal["white", "pink", "black"]
class TShirt: color: Literal["white", "pink", "black"]
class WhiteTShirt: color: Literal["white"] = "white"
class DyedTShirt: color: Literal["white", "pink", "black"]
def dyepinkunchecked(shirt: TShirt): "I dye any shirt pink" shirt.color = "pink" return shirt
def dyepinktype_checked(shirt: WhiteTShirt): "I dye white shirts pink" shirt.color = "pink" return shirt
def dyepinkruntime_checked(shirt: TShirt): "I dye white shirts pink" if shirt.color == "white": shirt.color = "pink" else: raise TypeError("Can only dye white shirts") return shirt
```
Assuming only white t-shirts can be dyed pink in reality, the following validation cases could occur:
| case | planning-check | runtime-check | reality-check |
| ------ | ------ | ------ | ------ |
| dye_pink(TShirt(color="white")) | valid | valid | valid |
| dye_pink(Pullover()) | invalid | - | - |
| dye_pink(TShirt(color="black")) | valid | valid |invalid |
| dye_pink_type_checked(TShirt(color="black")) | invalid | - | - |
| dye_pink_runtime_checked(TShirt(color="black")) | valid | invalid | - |
In addition we can define the scenario that dye_once must never be executed twice on the same object.
``` def dyeonceunchecked(shirt: TShirt) -> TShirt: shirt.color = "pink" return TShirt(color="pink")
def dyeoncetype_check(shirt: TShirt) -> DyedTShirt: return DyedTShirt(color="pink") ```
The following code can be detected as invalid with dye_once_type_check by using static type checking at the planning phase, which is not the case if dye_once_unchecked is used
py
tshirt = dye_once_*(TShirt(color=white))
tshirt = dye_once_*(tshirt)
The following code needs a more complex analysis during/after runtime. dye_once_type_check will fail at runtime if a and b is True, but dyeoncetypecheck would required a tracing of the object tshirt
```py
tshirt = TShirt()
if a : tshirt = dyeonce*(tshirt)
if b : tshirt = dyeonce_*(tshirt)
```
Experimental Workflow
Validation and comparison of battery cycling procedures consisting of a limited set of methodes and control structures:
```py
class VoltageUnit(str, Enum):
"""note: identifiers SHOULD be compatible with pint"""
V: "
class Voltage(BaseModel): value: float unit: Optional[VoltageUnit] = VoltageUnit.V
class ChargeParam(BaseModel): target_voltage: Union[float, Voltage]
class Battery(BaseModel): def charge(self, param: ChargeParam): ... ```
Common lib
py
i = 0
while (
i < 1000 and
battery.get(Temperature) < Temperature(100) and
battery.get(StateOfHealth) > StateOfHealth(80)
):
battery.charge(ChargingParam(target_voltage=Voltage(4.2), c_rate=CRate(0.23)))
battery.discharge(ChargingParam(target_voltage=Voltage(3.7), c_rate=CRate(0.23)))
battery.rest(RestParam(duration=Duration(10)))
battery.set(StateOfHealth) = ...
i += 1
Specific procedure
AST
```yml _type: Module body: - _type: Assign targets: - _type: Name id: i value: _type: Constant value: 0 - _type: While body: - _type: Expr value: _type: Call args: - _type: Call args: [] func: _type: Name id: ChargingParam keywords: - _type: keyword arg: target_voltage value: _type: Call args: - _type: Constant value: 4.2 func: _type: Name id: Voltage keywords: [] - _type: keyword arg: c_rate value: _type: Call args: - _type: Constant value: 0.23 func: _type: Name id: CRate keywords: [] func: _type: Attribute attr: charge value: _type: Name id: battery keywords: [] - _type: Expr value: _type: Call args: - _type: Call args: [] func: _type: Name id: ChargingParam keywords: - _type: keyword arg: target_voltage value: _type: Call args: - _type: Constant value: 3.7 func: _type: Name id: Voltage keywords: [] - _type: keyword arg: c_rate value: _type: Call args: - _type: Constant value: 0.23 func: _type: Name id: CRate keywords: [] func: _type: Attribute attr: discharge value: _type: Name id: battery keywords: [] - _type: Expr value: _type: Call args: - _type: Call args: [] func: _type: Name id: RestParam keywords: - _type: keyword arg: duration value: _type: Call args: - _type: Constant value: 10 func: _type: Name id: Duration keywords: [] func: _type: Attribute attr: rest value: _type: Name id: battery keywords: [] - _type: Expr value: _type: Call args: - _type: Call args: - _type: BinOp left: _type: Call args: [] func: _type: Name id: StateOfHealth keywords: [] op: _type: Sub right: _type: Call args: - _type: Constant value: 0.1 func: _type: Name id: StateOfHealth keywords: [] func: _type: Attribute attr: get value: _type: Name id: battery keywords: [] func: _type: Attribute attr: set value: _type: Name id: battery keywords: [] - _type: AugAssign op: _type: Add target: _type: Name id: i value: _type: Constant value: 1 orelse: [] test: _type: BoolOp op: _type: And values: - _type: Compare comparators: - _type: Constant value: 1000 left: _type: Name id: i ops: - _type: Lt - _type: Compare comparators: - _type: Call args: - _type: Constant value: 100 func: _type: Name id: Temperature keywords: [] left: _type: Call args: - _type: Name id: Temperature func: _type: Attribute attr: get value: _type: Name id: battery keywords: [] ops: - _type: Lt - _type: Compare comparators: - _type: Call args: - _type: Constant value: 80 func: _type: Name id: StateOfHealth keywords: [] left: _type: Call args: - _type: Name id: StateOfHealth func: _type: Attribute attr: get value: _type: Name id: battery keywords: [] ops: - _type: Gt type_ignores: [] ```RDF
```turtleFurther Links
- Discussion in the scope of MADICES workflows: https://github.com/MADICES/MADICES-2025/discussions/16
- Discussion on a possible integration with Python Workflow Definition (PWD): https://github.com/pythonworkflow/python-workflow-definition/issues/127
- Discussion in the scope of PMD workflows: https://git.material-digital.de/workflows/pmd-workflows/-/issues/7#note_796
Owner
- Name: OO-LD
- Login: OO-LD
- Kind: organization
- Repositories: 1
- Profile: https://github.com/OO-LD
Citation (CITATION.cff)
cff-version: 1.2.0
title: The Abstract Workflow Language (AWL) and it's Linked Data extension (AWL-LD)
type: software
authors:
- family-names: Stier
given-names: Simon P.
orcid: 'https://orcid.org/0000-0003-0410-3616'
- family-names: Gold
given-names: Lukas
orcid: 'https://orcid.org/0000-0001-7444-2969'
- family-names: Wilhelm
given-names: Nicolas
- family-names: Nusko
given-names: Daniel
repository-code: 'https://github.com/OO-LD/awl-schema'
url: 'https://github.com/OO-LD/awl-schema'
abstract: >-
Workflows play an essential part in structuring the processing of both physical objects and data.
Despite the relevance of detailed information about the processing history of a resulting asset for provenance, compareability and reproduceability
a detailed and machine readable description of the workflow is usually not part of our scientific work and publication.
AWL and AWL-LD address this issue by creating tools and definitions to document both experimental and computational workflows.
The aim is to provide common notations for workflows in order to generate a semantic description that allows further analytics and transformations based on OO-LD.
keywords:
- workflow language
- workflow specification
- workflow execution
- linked-data
- rdf
- json-ld
- abstract syntax tree
- tracing
license: CC-BY-4.0
GitHub Events
Total
- Create event: 2
- Issues event: 1
- Release event: 2
- Watch event: 4
- Push event: 17
Last Year
- Create event: 2
- Issues event: 1
- Release event: 2
- Watch event: 4
- Push event: 17