join-order-benchmark
The Join Order Benchmark (JOB) queries from "How Good Are Query Optimizers, Really?"
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Repository
The Join Order Benchmark (JOB) queries from "How Good Are Query Optimizers, Really?"
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Join-Order-Benchmark
This package contains the Join Order Benchmark (JOB) queries from: "How Good Are Query Optimizers, Really?" by Viktor Leis, Andrey Gubichev, Atans Mirchev, Peter Boncz, Alfons Kemper, Thomas Neumann PVLDB Volume 9, No. 3, 2015
The csv_files/imdb-create-tables.sql and queries/*.sql are modified to MySQL syntax.
Quick Start
Obtain the data:
shell cd csv_files/ wget http://homepages.cwi.nl/~boncz/job/imdb.tgz tar -xvzf imdb.tgzLaunch the database server and connect (with local-infile turned on in the database server)
Create IMDb tables in MySQL:
sqlmysql
mysql> SOURCE /Users/olafrosendahl/Documents/GitHub/join-order-benchmark/csv_files/imdb-create-tables.sql
Load data in MySQL:
sqlmysql mysql> SOURCE /Users/olafrosendahl/Documents/GitHub/join-order-benchmark/csv_files/imdb-load-data.sqlAdd indexes to the IMDb database in MySQL
sqlmysql mysql> SOURCE /Users/olafrosendahl/Documents/GitHub/join-order-benchmark/csv_files/imdb-index-tables.sql
Copy the data-directory afterwards to allow restoring the database data without loading it again if necessary. The data-directory to make a copy of is:
/build/mysql-test/var/mysqld.1/data
Running the queries
We use hyperfine as a benchmarking-tool to measure the queries, you'll therefore need to install it before running the queries. To run all queries, run the following in your terminal:
bash
./run_queries.sh
This will run the queries in the queries-folder one-by-one, first without re-optimization, and then with re-optimization using different variables for the re-optimization hint. The results are outputted to different folders in the results-folder as json-files for each query. You'll be able to see the progress in the terminal as the queries are being executed.
Run single query
You can also run a single query without and with re-optimization by running the following in your terminal, replace <query> with the name of the query you want to run:
sh
./run_query.sh <query>
The result wil be outputted to a file in the results-folder as a json-file and will also be visible in the terminal.
Order Problem
Please note that queries/17b.sql and queries/8d.sql may exhibit order issues due to the use of different order rules from MySQL. This is not a real bug.
Analyze results
We've created a Python-script with lots of different methods for visualizing the results in visulize-info.py. Open it to chose which results you want visualized and before running it.
Owner
- Name: Olaf Rosendahl
- Login: olros
- Kind: user
- Location: Trondheim, Norway
- Company: Kantega
- Website: https://olafros.com
- Repositories: 2
- Profile: https://github.com/olros
Computer Engineering student at NTNU Trondheim
Citation (CITATION.cff)
cff-version: 1.2.0
title: Join Order Benchmark
message: >-
If you use this software in scientific
publications, please consider citing it using the
metadata from this file.
type: software
authors:
- given-names: Olaf
family-names: Rosendahl
email: olafrosendahl@gmail.com
repository-code: 'https://github.com/olros/join-order-benchmark'
abstract: The Join Order Benchmark in MySQL with scripts for running the benchmark.
license: MIT
GitHub Events
Total
Last Year
Dependencies
- matplotlib *
- pandas *
- seaborn *