dev.roanh.gmark:gmark

gMark is a domain- and query language-independent query workload generator and query language utility library.

https://github.com/roanh/gmark

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

ast cpq cq gmark graph graph-database query query-evaluation query-generator query-language rpq schema-driven
Last synced: 4 months ago · JSON representation ·

Repository

gMark is a domain- and query language-independent query workload generator and query language utility library.

Basic Info
  • Host: GitHub
  • Owner: RoanH
  • License: gpl-3.0
  • Language: Java
  • Default Branch: master
  • Homepage:
  • Size: 9.63 MB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 6
Topics
ast cpq cq gmark graph graph-database query query-evaluation query-generator query-language rpq schema-driven
Created almost 4 years ago · Last pushed 7 months ago
Metadata Files
Readme Funding License Citation

README.md

gMark

gMark is a domain- and query language-independent query workload generator, as well as a general utility library for working with the CPQ (conjunctive path query), RPQ (regular path query), and CQ (conjunctive query) query languages. gMark also includes a complete query evaluation pipeline for the CPQ and RPQ query languages. This project was originally started as a rewrite of the original version of gMark available on GitHub at gbagan/gmark, with as goal to make gMark easier to extend and better documented. However, presently the focus of the project has shifted primarily towards query languages, notably CPQ. Graph generation is currently out of scope for this project, though full feature parity for query generation is still planned. Presently, most of the features available for RPQs in the original version of gMark are available for CPQs in this version, with the exception of some output formats. However, the utilities available within gMark for working with query languages in general are much more extensive than those available in the original version of gMark. In addition, this version of gMark also has a highly optimised evaluation pipeline for CPQ and RPQ queries.

Documentation & Research

The current state of the repository is the result of several research projects, each of these research items can be consulted for more information on a specific component in gMark:

The javadoc documentation for this repository can be found at: gmark.docs.roanh.dev

Getting started with gMark

To support a wide variety of of use cases gMark is a available in a number of different formats.

Command line usage

gMark can be used from the command line to either evaluate queries on a database graph, or to generate a workload of queries.

Evaluating Queries

When using gMark on the command line to evaluate queries the following arguments are supported.

usage: gmark evaluate [-f] [-g <data>] [-h] [-l <query language>] [-o <file>] [-q <query>] [-s <source>] [-t <target>] [-w <file>] -f,--force Overwrite the output file if present. -g,--graph <data> The database graph file. -h,--help Prints this help text. -l,--language <query language> The query language for the queries to execute (cpq or rpq). -o,--output <file> The file to write the query output to. -q,--query <query> The query to evaluate. -s,--source <source> Optionally the bound source node for the query. -t,--target <target> Optionally the bound target node for the query. -w,--workload <file> The query workload to run, one query per line with format 'source, query, target'.

The evaluator is intended to be used with either a single query to evaluate (-s/-q/-t) or with a complete workload of queries (-w). The database graph is expected to be provided in a simple text based graph format with on the first line the number of vertices, edges and labels, and a single edge definition following the source target label format on the remaining lines. Queries are expected to be either CPQs or RPQs and if provided as a workload file, a single query is allowed per line following the source,query,target format, if the source/target is not bound * can be provided instead. Finally, note that vertices and labels are represented by integers. Various example graphs and query workloads can be found in the workload folder.

For example, a single CPQ query can be evaluated using:

sh gmark evaluate -l cpq -s 56 -q "a ◦ b" -t 5 -g ./graph.edge -o out.txt

Alternatively, an entire workload of queries can be evaluated using:

sh gmark evaluate -l cpq -w ./queries.cpq -g ./graph.edge -o out.txt

Note that only limited query evaluation output is written to the console, in particular, the result paths are only written to the provided output file if any.

Workload Generation

When using gMark on the command line for workload generation the following arguments are supported:

usage: gmark workload [-c <file>] [-f] [-h] [-o <folder>] [-s <syntax>] -c,--config <file> The workload and graph configuration file. -f,--force Overwrite existing files if present. -h,--help Prints this help text. -o,--output <folder> The folder to write the generated output to. -s,--syntax <syntax> The concrete syntax(es) to output (sql and/or formal).

For example, a workload of queries in SQL format can be generated using:

sh gmark workload -c config.xml -o ./output -s sql

An example configuration XML file can be found both in this repository and in the graphical interface of the standalone executable. The example RPQ workload configuration files included in the original gMark repository are also compatible and can be found in the use-cases folder.

Executable download

gMark is available as a standalone portable executable that has both a graphical interface and a command line interface. The graphical interface will only be launched when no command line arguments are passed. This version of gMark requires Java 21 or higher to run.

All releases: releases
GitHub repository: RoanH/gMark

Command line usage of the standalone executable

The following commands show how to generate a workload of queries in SQL format using the standalone executable. Note that more detailed command line usage instructions are given in command line usage.

Windows executable

bat ./gMark.exe workload -c config.xml -o ./output -s sql

Runnable Java archive

sh java -jar gMark.jar workload -c config.xml -o ./output -s sql

Docker image

gMark is available as a docker image on Docker Hub. This means that you can obtain the image using the following command:

sh docker pull roanh/gmark:latest

Using the image then works much the same as the regular command line version of gMark. For example, we can generate the example workload of queries in SQL format using the following command:

sh docker run --rm -v "$PWD/data:/data" roanh/gmark:latest workload -c /data/config.xml -o /data/queries -s sql

Note that we mount a local folder called data into the container to pass our configuration file and to retrieve the generated queries.

Maven artifact Maven Central

gMark is available on Maven central as an artifact so it can be included directly in another Java project using Gradle or Maven. This way it becomes possible to directly use all the implemented constructs and utilities. A hosted version of the javadoc for gMark can be found at gmark.docs.roanh.dev.

Gradle

```groovy repositories{ mavenCentral() }

dependencies{ implementation 'dev.roanh.gmark:gmark:2.1' } ```

Maven

xml <dependency> <groupId>dev.roanh.gmark</groupId> <artifactId>gmark</artifactId> <version>2.1</version> </dependency>

Query Language API

Most of the query language API is accessible directly via the CPQ, RPQ, and CQ classes. For example, queries can be constructed using:

```java Predicate a = new Predicate(0, "a");

CPQ query = CPQ.parse("a ∩ a"); CPQ query = CPQ.intersect(a, a); CPQ query = CPQ.generateRandomCPQ(4, 1);

RPQ query = RPQ.parse("a ◦ a"); RPQ query = RPQ.disjunct(RPQ.concat(a, a), a); RPQ query = RPQ.generateRandomRPQ(4, 1);

CQ query = CQ.parse("(f1, f2) ← one(b1, f2), zero(f1, b1)"); CQ query = CPQ.parse("a ∩ a").toCQ(); CQ query = CQ.empty(); query.addAtom( query.addFreeVariable("f1"), a, query.addBoundVariable("b1") ); ```

For CPQs query graphs and cores can be constructed using:

```java CPQ query = ...;

QueryGraphCPQ graph = query.toQueryGraph(); QueryGraphCPQ core = query.toQueryGraph().computeCore(); QueryGraphCPQ core = query.computeCore(); ```

For CQs query graphs can be constructed using:

```java CQ query = ...;

QueryGraphCQ graph = query.toQueryGraph(); ```

Other notable utilities for CPQ, RPQ, and CQ are:

```java CPQ query = ...;

String sql = query.toSQL(); String formal = query.toFormalSyntax(); QueryTree ast = query.toAbstractSyntaxTree(); ```

Note that CPQs, RPQs, and CQs can also be constructed from an AST, which can sometimes be used to convert between query languages:

java RPQ rpq = RPQ.parse("a ◦ a"); CPQ cpq = CPQ.parse(rpq.toAbstractSyntaxTree()); CQ cq = cpq.toCQ();

All more general utilities can be found under the dev.roanh.gmark.util package.

Development of gMark

This repository contain an Eclipse & Gradle project with Util and Apache Commons CLI as the only dependencies. Development work can be done using the Eclipse IDE or using any other Gradle compatible IDE. Continuous integration will check that all source files use Unix style line endings (LF) and that all functions and fields have valid documentation. Unit testing is employed to test core functionality, CI will also check for regressions using these tests. A hosted version of the javadoc for gMark can be found at gmark.docs.roanh.dev. Compiling the runnable Java archive (JAR) release of gMark using Gradle can be done using the following command in the gMark directory:

sh ./gradlew client:shadowJar

After which the generated JAR can be found in the build/libs directory. On windows ./gradlew.bat should be used instead of ./gradlew.

History

Project development started: 25th of September, 2021.

Owner

  • Name: Roan Hofland
  • Login: RoanH
  • Kind: user
  • Location: Japan
  • Company: alfa1 / group9 / ASML

I'm just a random programmer ^_^. At the moment I really like osu! and writing programs for it. My favorite programming language is Java. Discord: Roan#5667

Citation (CITATION.cff)

cff-version: 1.2.0
title: gMark
version: 2.1
date-released: 2025-06-15
message: "If you use this software, please cite it using the metadata from this file."
type: software
authors:
  - given-names: Roan
    family-names: Hofland
    email: roan@roanh.dev
contact:
  - given-names: Roan
    family-names: Hofland
    email: roan@roanh.dev
    website: 'https://roanh.dev'
repository-code: 'https://github.com/RoanH/gMark'
repository-artifact: 'https://mvnrepository.com/artifact/dev.roanh.gmark/gmark'
abstract: "gMark is a domain- and query language-independent query workload generator and query language utility library."
license: GPL-3.0-or-later
year-original: 2021
references:
  - title: "Indexing Conjunctive Path Queries for Accelerated Query Evaluation"
    authors:
    - given-names: Roan
      family-names: Hofland
      affiliation: "Osaka University and Eindhoven University of Technology"
      email: roan@roanh.dev
    type: thesis
    url: 'https://research.roanh.dev/Indexing%20Conjunctive%20Path%20Queries%20for%20Accelerated%20Query%20Evaluation.pdf'
    date-published: 2023-08-02
    department: "Graduate School of Information Science and Technology and Department of Mathematics and Computer Science"
    institution:
      name: "Osaka University and Eindhoven University of Technology"
  - title: "Graph Database & Query Evaluation Terminology"
    authors:
    - given-names: Roan
      family-names: Hofland
      email: roan@roanh.dev
    type: report
    url: 'https://research.roanh.dev/Graph%20Database%20&%20Query%20Evaluation%20Terminology%20v1.3.pdf'
    date-published: 2024-09-23
  - title: "Conjunctive Path Query Generation for Benchmarking"
    authors:
    - given-names: Roan
      family-names: Hofland
      affiliation: "Eindhoven University of Technology"
      email: r.w.p.hofland@student.tue.nl
    type: report
    url: 'https://research.roanh.dev/Conjunctive%20Path%20Query%20Generation%20for%20Benchmarking%20v2.8.pdf'
    date-published: 2022-03-21
    department: "Department of Mathematics and Computer Science"
    institution:
      name: "Eindhoven University of Technology"
  - title: "CPQ Keys: a survey of graph canonization algorithms"
    authors:
    - given-names: Roan
      family-names: Hofland
      affiliation: "Eindhoven University of Technology"
      email: r.w.p.hofland@student.tue.nl
    type: report
    url: 'https://research.roanh.dev/cpqkeys/CPQ%20Keys%20v1.1.pdf'
    date-published: 2022-07-17
    department: "Department of Mathematics and Computer Science"
    institution:
      name: "Eindhoven University of Technology"
  - title: "Language-aware Indexing for Conjunctive Path Queries"
    authors:
    - family-names: Sasaki
      given-names: Yuya
      affiliation: "Osaka University"
    - family-names: H. L. Fletcher
      given-names: George
      affiliation: "Eindhoven University of Technology"
    - family-names: Onizuka
      given-names: Makoto
      affiliation: "Osaka University"
    type: article
    url: 'https://arxiv.org/abs/2003.03079'
    year: 2022
  - title: "Querying Graphs"
    authors:
    - family-names: Bonifati
      given-names: Angela
      affiliation: "University Lyon 1"
    - family-names: Fletcher
      given-names: George
      affiliation: "Eindhoven University of Technology"
    - family-names: Voigt
      given-names: Hannes
      affiliation: "Neo4j / University of Dresden"
    - family-names: Yakovets
      given-names: Nikolay
      affiliation: "Eindhoven University of Technology"
    type: book
    url: 'https://perso.liris.cnrs.fr/angela.bonifati/pubs/book-Bonifati-et-al-18.pdf'
    year: 2018
  - title: "Database System Concepts, Seventh Edition"
    authors:
    - family-names: Silberschatz
      given-names: Avi
      affiliation: "Yale University"
    - family-names: Korth
      given-names: Hank
      affiliation: "Lehigh University"
    - family-names: Sudarshan
      given-names: Shashank
      affiliation: "Indian Institute of Technology Bombay"
    type: book
    url: 'https://www.db-book.com/'
    year: 2020
  - title: "gMark: Schema-Driven Generation of Graphs and Queries"
    authors:
    - family-names: Guillaume
      given-names: Bagan
      affiliation: "University Lyon 1"
    - family-names: Bonifati
      given-names: Angela
      affiliation: "University Lyon 1"
    - family-names: Ciucanu
      given-names: Radu
      affiliation: "Blaise Pascal University"
    - family-names: H. L. Fletcher
      given-names: George
      affiliation: "Eindhoven University of Technology"
    - family-names: Lemay
      given-names: Aurélien
      affiliation: "University of Lille"
    - family-names: Advokaat
      given-names: Nicky
      affiliation: "Eindhoven University of Technology"
    type: article
    url: 'https://arxiv.org/abs/1511.08386'
    year: 2017

GitHub Events

Total
  • Release event: 2
  • Delete event: 5
  • Push event: 96
  • Create event: 9
Last Year
  • Release event: 2
  • Delete event: 5
  • Push event: 96
  • Create event: 9

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 6
repo1.maven.org: dev.roanh.gmark:gmark

A domain- and query language-independent query workload generator and query language utility library.

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 1
Rankings
Dependent repos count: 20.8%
Average: 45.1%
Dependent packages count: 50.2%
Stargazers count: 50.9%
Forks count: 58.7%
Last synced: 6 months ago

Dependencies

gMark/build.gradle maven
  • commons-cli:commons-cli 1.5.0 implementation
  • org.junit.jupiter:junit-jupiter * testImplementation
Dockerfile docker
  • openjdk 8 build