hep-iris-benchmark-scripts

HEP IRIS benchmark scripts

https://github.com/rumbledb/hep-iris-benchmark-scripts

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    2 of 2 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

HEP IRIS benchmark scripts

Basic Info
  • Host: GitHub
  • Owner: RumbleDB
  • Language: Python
  • Default Branch: master
  • Size: 4.26 MB
Statistics
  • Stars: 2
  • Watchers: 3
  • Forks: 1
  • Open Issues: 0
  • Releases: 6
Created almost 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

Benchmark Scripts for Evaluating Query Languages and Systems for High-Energy Physics Data

DOI

This repository contains benchmarks scripts for running the implementations of High-energy Physics (HEP) analysis queries from the IRIS HEP benchmark for various general-purpose query processing systems. The results have been published in the following paper:

Dan Graur, Ingo Müller, Mason Proffitt, Ghislain Fourny, Gordon T. Watts, Gustavo Alonso. Evaluating Query Languages and Systems for High-Energy Physics Data. In: PVLDB 15(2), 2022. DOI: 10.14778/3489496.3489498.

Please cite both, the paper and the software, when citing in academic contexts.

Overview of the repository

This repository contains the scripts for producing the datasets, the scripts for running the experiments, and the scripts for plotting the results used in the paper mentioned above.

We recommend to get started with the scripts in the following order:

  1. Get individual queries to run with the systems you are interested in using the small sample datasets provided for each system.

For that purpose, look at the general instructions in the experiments folder as well as the system-specific instructions in the subfolders of the respective systems. 1. Generate the full datasets as described in the datasets folder and upload them to cloud storage and/or load them as per the system-specific instructions. 1. Run the actual experiments using the system-specific scripts from the subfolders of the respective systems.

Running all experiments takes several days and costs at least several hundred dollars of cloud credits, so it's probably a good idea to start with a small subset, then extend them as you gain experience and confidence. 1. Re-generate the plots with the scripts in the plots folder.

We provide the data we used for the plots in the original paper, but you can also copy over your own measurement data and plot that.

Owner

  • Name: RumbleDB
  • Login: RumbleDB
  • Kind: organization
  • Location: Zurich, Switzerland

Query your large messy datasets, no matter where they are.

Citation (CITATION.cff)

# YAML 1.2
---
cff-version: 1.2.0
title: Benchmark Scripts for "Evaluating Query Languages and Systems for High-Energy Physics Data"
message: |
    This repository hosts the experiment scripts used for the following paper. Please cite both, the software and the paper, when citing in academic contexts.
    
    Dan Graur, Ingo Müller, Mason Proffitt, Ghislain Fourny, Gordon T. Watts, Gustavo Alonso. "Evaluating Query Languages and Systems for High-Energy Physics Data." In: PVLDB 15(2), 2022. DOI: 10.14778/3489496.3489498.
type: software
repository-code: "https://github.com/RumbleDB/hep-iris-benchmark-scripts"
authors:
  - given-names: Dan
    family-names: Graur
    email: dan.graur@inf.ethz.ch
    affiliation: "ETH Zurich"
  - given-names: Ingo
    family-names: "Müller"
    email: ingo.mueller@inf.ethz.ch
    affiliation: "ETH Zurich"
    orcid: "https://orcid.org/0000-0001-8818-8324"
  - given-names: Mason
    family-names: Proffitt
    email: masonLp@uw.edu
    affiliation: "University of Washington"
    orcid: "https://orcid.org/0000-0001-8740-8866"
  - given-names: Ghislain
    family-names: Fourny
    email: ghislain.fourny@inf.ethz.ch
    affiliation: "ETH Zurich"
    orcid: "https://orcid.org/0000-0001-8740-8866"
  - given-names: "Gordon T."
    family-names: Watts
    email: gwatts@uw.edu
    affiliation: "University of Washington"
  - given-names: Gustavo
    family-names: Alonso
    email: alonso@inf.ethz.ch
    affiliation: "ETH Zurich"
identifiers:
  - description: The scripts used for the experiments in the paper.
    type: doi
    value: "10.5281/zenodo.5569049"
  - description: The paper describing the results of the experiments.
    type: doi
    value: "10.14778/3489496.3489498"
...

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 281
  • Total Committers: 2
  • Avg Commits per committer: 140.5
  • Development Distribution Score (DDS): 0.31
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Ingo Mueller i****r@i****h 194
DanGraur d****r@i****h 87
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago


Dependencies

datasets/requirements-top-level.txt pypi
  • awscli *
  • parquet-tools *
datasets/requirements.txt pypi
  • PyYAML ==5.4.1
  • awscli ==1.23.2
  • boto3 ==1.22.1
  • botocore ==1.25.2
  • colorama ==0.4.4
  • cursor ==1.3.4
  • docutils ==0.15.2
  • halo ==0.0.29
  • jmespath ==1.0.0
  • log-symbols ==0.0.14
  • numpy ==1.22.3
  • pandas ==1.4.2
  • parquet-tools ==0.2.10
  • pyarrow ==7.0.0
  • pyasn1 ==0.4.8
  • python-dateutil ==2.8.2
  • pytz ==2022.1
  • rsa ==4.7.2
  • s3transfer ==0.5.2
  • six ==1.16.0
  • spinners ==0.0.24
  • tabulate ==0.8.9
  • termcolor ==1.1.0
  • thrift ==0.13.0
  • urllib3 ==1.26.9
experiments/athena/requirements.txt pypi
  • matplotlib *
  • pandas *
  • pyathena *
  • pytest *
experiments/presto/requirements.txt pypi
  • humanfriendly *
plots/requirements-top-level.txt pypi
  • matplotlib *
  • pandas *
plots/requirements.txt pypi
  • Pillow ==9.1.0
  • cycler ==0.11.0
  • fonttools ==4.33.3
  • kiwisolver ==1.4.2
  • matplotlib ==3.5.1
  • numpy ==1.22.3
  • packaging ==21.3
  • pandas ==1.4.2
  • pyparsing ==3.0.8
  • python-dateutil ==2.8.2
  • pytz ==2022.1
  • six ==1.16.0