hep-iris-benchmark-scripts
HEP IRIS benchmark scripts
Science Score: 77.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
2 of 2 committers (100.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Repository
HEP IRIS benchmark scripts
Basic Info
- Host: GitHub
- Owner: RumbleDB
- Language: Python
- Default Branch: master
- Size: 4.26 MB
Statistics
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 6
Metadata Files
README.md
Benchmark Scripts for Evaluating Query Languages and Systems for High-Energy Physics Data
This repository contains benchmarks scripts for running the implementations of High-energy Physics (HEP) analysis queries from the IRIS HEP benchmark for various general-purpose query processing systems. The results have been published in the following paper:
Dan Graur, Ingo Müller, Mason Proffitt, Ghislain Fourny, Gordon T. Watts, Gustavo Alonso. Evaluating Query Languages and Systems for High-Energy Physics Data. In: PVLDB 15(2), 2022. DOI: 10.14778/3489496.3489498.
Please cite both, the paper and the software, when citing in academic contexts.
Overview of the repository
This repository contains the scripts for producing the datasets, the scripts for running the experiments, and the scripts for plotting the results used in the paper mentioned above.
We recommend to get started with the scripts in the following order:
- Get individual queries to run with the systems you are interested in using the small sample datasets provided for each system.
For that purpose, look at the general instructions in the
experiments folder as well as the system-specific
instructions in the subfolders of the respective systems.
1. Generate the full datasets as described in the datasets folder
and upload them to cloud storage and/or load them as per the system-specific
instructions.
1. Run the actual experiments using the system-specific scripts from the
subfolders of the respective systems.
Running all experiments takes several days and costs at least several hundred
dollars of cloud credits, so it's probably a good idea to start with a small
subset, then extend them as you gain experience and confidence.
1. Re-generate the plots with the scripts in the plots folder.
We provide the data we used for the plots in the original paper, but you can also copy over your own measurement data and plot that.
Owner
- Name: RumbleDB
- Login: RumbleDB
- Kind: organization
- Location: Zurich, Switzerland
- Website: http://rumbledb.org/
- Twitter: db_rumble
- Repositories: 13
- Profile: https://github.com/RumbleDB
Query your large messy datasets, no matter where they are.
Citation (CITATION.cff)
# YAML 1.2
---
cff-version: 1.2.0
title: Benchmark Scripts for "Evaluating Query Languages and Systems for High-Energy Physics Data"
message: |
This repository hosts the experiment scripts used for the following paper. Please cite both, the software and the paper, when citing in academic contexts.
Dan Graur, Ingo Müller, Mason Proffitt, Ghislain Fourny, Gordon T. Watts, Gustavo Alonso. "Evaluating Query Languages and Systems for High-Energy Physics Data." In: PVLDB 15(2), 2022. DOI: 10.14778/3489496.3489498.
type: software
repository-code: "https://github.com/RumbleDB/hep-iris-benchmark-scripts"
authors:
- given-names: Dan
family-names: Graur
email: dan.graur@inf.ethz.ch
affiliation: "ETH Zurich"
- given-names: Ingo
family-names: "Müller"
email: ingo.mueller@inf.ethz.ch
affiliation: "ETH Zurich"
orcid: "https://orcid.org/0000-0001-8818-8324"
- given-names: Mason
family-names: Proffitt
email: masonLp@uw.edu
affiliation: "University of Washington"
orcid: "https://orcid.org/0000-0001-8740-8866"
- given-names: Ghislain
family-names: Fourny
email: ghislain.fourny@inf.ethz.ch
affiliation: "ETH Zurich"
orcid: "https://orcid.org/0000-0001-8740-8866"
- given-names: "Gordon T."
family-names: Watts
email: gwatts@uw.edu
affiliation: "University of Washington"
- given-names: Gustavo
family-names: Alonso
email: alonso@inf.ethz.ch
affiliation: "ETH Zurich"
identifiers:
- description: The scripts used for the experiments in the paper.
type: doi
value: "10.5281/zenodo.5569049"
- description: The paper describing the results of the experiments.
type: doi
value: "10.14778/3489496.3489498"
...
GitHub Events
Total
Last Year
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Ingo Mueller | i****r@i****h | 194 |
| DanGraur | d****r@i****h | 87 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
Dependencies
- awscli *
- parquet-tools *
- PyYAML ==5.4.1
- awscli ==1.23.2
- boto3 ==1.22.1
- botocore ==1.25.2
- colorama ==0.4.4
- cursor ==1.3.4
- docutils ==0.15.2
- halo ==0.0.29
- jmespath ==1.0.0
- log-symbols ==0.0.14
- numpy ==1.22.3
- pandas ==1.4.2
- parquet-tools ==0.2.10
- pyarrow ==7.0.0
- pyasn1 ==0.4.8
- python-dateutil ==2.8.2
- pytz ==2022.1
- rsa ==4.7.2
- s3transfer ==0.5.2
- six ==1.16.0
- spinners ==0.0.24
- tabulate ==0.8.9
- termcolor ==1.1.0
- thrift ==0.13.0
- urllib3 ==1.26.9
- matplotlib *
- pandas *
- pyathena *
- pytest *
- humanfriendly *
- matplotlib *
- pandas *
- Pillow ==9.1.0
- cycler ==0.11.0
- fonttools ==4.33.3
- kiwisolver ==1.4.2
- matplotlib ==3.5.1
- numpy ==1.22.3
- packaging ==21.3
- pandas ==1.4.2
- pyparsing ==3.0.8
- python-dateutil ==2.8.2
- pytz ==2022.1
- six ==1.16.0