dspf-bm

https://github.com/jawadtahir/dspf-bm

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: jawadtahir
License: apache-2.0
Language: Java
Default Branch: main
Size: 41.5 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 5
Releases: 0

Created almost 5 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

PGVal: Evaluate fault tolerance of Stream Processing Systems.

PGVal is a benchmarking system for stream processing systems (SPSs). PGVal benchmarks SPSs for correctness and performance. For details, please see the published paper at: https://doi.org/10.14778/3712221.3712227

This ReadMe file explains the steps to performs benchmarks. To run benchmarks we need three things.

Infrastructure
Set up
Benchmark scripts

All components of the benchmark are deployed on a cluster of VMs. We provide terraform scripts to automate VM creation for OpenNebula and AWS (work in progress). These VMs need to be configured with the required packages and softwares. To this end, we provide Ansible scripts to automate the configuration. Once VMs are set up, you can run benchmark scripts.

Infrastructure

If you already have VMs that are connected on a network, you can skip this step and read set up instruction. We provide Terraform scripts for OpenNebula and AWS in Infra/terraform folder. shell cd infra/terraform/<provider> #AWS requires AWS CLI installed terraform init terraform plan #optional terraform apply

Set up

We provide Ansible scripts to automate the set-up process in Infra/ansible. If you ran terraform scripts, an inventory.cfg will be created automatically. Otherwise, create an inventory.cfg as per given sample. After that run shell cd infra/ansible ansible-playbook -c ssh -i inventory.cfg setup-machines.yaml The script install and configure required packages and softwares, such as HDFS, Java, Docker, and node_exporter.

Benchmarking

We first need to run monitoring services (Prometheus and Grafana) to visualize the results.

shell cd operations-playground ./1_monitoring_start.sh

Setting up the dashboard

Open browser and head over to Grafana at <utilsIP>:4300 (see inventory.cfg in Infra/ansible)
Set up a new data source; URL http://prometheus:9090, Scrape interval 1s
Import the dashboard from operations-playground/dashboard/dashboard.json.

We provide scripts to benchmark Kafka Streams, Apache Storm, and Apache Flink in kstreams-scripts, storm-scripts, and flink-scripts folders, respectively. Each folder contains 2 scripts. Run ./2_<SPS>_start.sh script. The script deploys Apache Kafka cluster, creates topics, and deploys the specified SPS on a Docker Swarm cluster. After that, run shell ./3_experiment_start.sh The experiment will run and can be observed in Grafana dashboard. To configure experiments, please see the wiki.

//: # () //: # () //: # ()

//: # ()

Owner

Name: Jawad Tahir
Login: jawadtahir
Kind: user
Location: Munich
Company: TUM

Website: jawadtahir.de
Repositories: 15
Profile: https://github.com/jawadtahir

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: PGVal
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Jawad
    family-names: Tahir
    email: ranajawadtahir@gmail.com
    orcid: 'https://orcid.org/0000-0003-2008-7994'
identifiers:
  - type: doi
    value: 10.14778/3712221.3712227
    description: Paper
repository-code: 'https://github.com/jawadtahir/DSPF-BM'
abstract: >-
  Stream processing systems (SPSs) provide processing
  guarantees to ensure reliability under failure. However,
  no related work exists that empirically validates these
  guarantees. In this paper, we present PGVal, a tool that
  can end-to-end validate guarantees of SPSs. Additionally,
  we introduce new metrics for SPSs, such as reliability,
  reliable throughput, and failure cost, in addition to a
  refined definition of latency that results in improved
  measurements. We benchmark three popular SPSs, namely
  Kafka Streams, Apache Storm, and Apache Flink. Our results
  show that the reliability of SPSs depends on many
  characteristics, such as data rate, data partitions,
  processing topology, and parallelism factor. An SPS
  configuration may not continue to provide reliable outputs
  when any of these characteristics vary. PGVal can also
  inject faults into SPSs to observe their impact on
  reliability and performance. We provide a comprehensive
  failure model for fault-tolerance benchmarking of SPSs and
  report on the impact of faults on the reliability and
  performance of SPSs. Our experiments show that SPSs’
  reliability and performance drop varies by fault. Lastly,
  we provide suggestions to increase the reliability and
  performance of these systems.
license: CC-BY-ND-4.0

GitHub Events

Total

Watch event: 2
Push event: 8
Gollum event: 3

Last Year

Watch event: 2
Push event: 8
Gollum event: 3

Dependencies

docker/flink-topology/Dockerfile docker

apache/flink 1.17.1-java11 build
maven 3.6.3-openjdk-11 build

docker/kafka-data-gen/Dockerfile docker

maven 3.6.3-openjdk-11 build
openjdk 11-jre-slim build

docker/kafka-streams-topology/Dockerfile docker

maven 3.6.3-openjdk-11-slim build
openjdk 11-jdk build

docker/latcal/Dockerfile docker

maven 3.6.3-openjdk-11 build
openjdk 11-jre-slim build

docker/ops-playground-image/Dockerfile docker

apache/flink 1.14.4-scala_2.12-java11 build
maven 3.6-jdk-11-slim build

docker/pumbasrvr-client/Dockerfile docker

maven latest build
openjdk 11-jdk-slim build

docker/storm-topology/Dockerfile docker

maven 3.6.3-openjdk-11-slim build
storm 2.5.0 build

docker/datamodel/pom.xml maven

com.fasterxml.jackson.core:jackson-annotations 2.13.0 provided
org.apache.commons:commons-lang3 3.13.0
junit:junit 4.13.1 test

docker/flink-topology/pom.xml maven

commons-cli:commons-cli 1.5.0
de.tum.in.msrg:datamodel 1.1-SNAPSHOT
io.prometheus:simpleclient 0.16.0
io.prometheus:simpleclient_httpserver 0.16.0
org.apache.flink:flink-clients 1.17.1
org.apache.flink:flink-connector-kafka 1.17.1
org.apache.flink:flink-core 1.17.1
org.apache.flink:flink-hadoop-fs 1.17.1
org.apache.flink:flink-metrics-prometheus 1.17.1
org.apache.logging.log4j:log4j-api 2.17.1
org.apache.logging.log4j:log4j-core 2.17.1
junit:junit 4.11 test

docker/kafka-data-gen/pom.xml maven

com.fasterxml.jackson.core:jackson-databind 2.13.2.1 compile
de.tum.in.msrg:datamodel 1.1-SNAPSHOT compile
org.apache.commons:commons-lang3 3.12.0 compile
org.apache.kafka:kafka-clients 3.6.0 compile
commons-cli:commons-cli 1.5.0
io.prometheus:simpleclient 0.15.0
io.prometheus:simpleclient_httpserver 0.15.0
org.apache.logging.log4j:log4j-api 2.17.1
org.apache.logging.log4j:log4j-core 2.17.1
junit:junit 4.13.1 test

docker/kafka-streams-topology/pom.xml maven

com.fasterxml.jackson.core:jackson-databind 2.15.3 compile
commons-cli:commons-cli 1.5.0
de.tum.in.msrg:datamodel 1.1-SNAPSHOT
org.apache.commons:commons-lang3 3.13.0
org.apache.kafka:kafka-streams 3.6.0
org.apache.logging.log4j:log4j-api 2.17.1
org.apache.logging.log4j:log4j-core 2.17.1
org.apache.logging.log4j:log4j-slf4j-impl 2.17.1
junit:junit 4.13.2 test
org.apache.kafka:kafka-streams-test-utils 3.6.0 test

docker/latcal/pom.xml maven

com.fasterxml.jackson.core:jackson-databind 2.13.2.1
commons-cli:commons-cli 1.5.0
de.tum.in.msrg:datamodel 1.0-SNAPSHOT
io.prometheus:simpleclient_httpserver 0.15.0
org.apache.commons:commons-lang3 3.12.0
org.apache.kafka:kafka-clients 3.2.3
org.apache.logging.log4j:log4j-api 2.17.1
org.apache.logging.log4j:log4j-core 2.17.1
junit:junit 4.13.1 test

docker/ops-playground-image/java/flink-playground-clickcountjob/pom.xml maven

org.jetbrains:annotations RELEASE compile
com.fasterxml.jackson.core:jackson-core 2.13.0
com.fasterxml.jackson.dataformat:jackson-dataformat-yaml 2.13.0
com.fasterxml.jackson.datatype:jackson-datatype-jsr310 2.13.0
io.prometheus:simpleclient 0.8.1
io.prometheus:simpleclient_httpserver 0.8.1
log4j:log4j 1.2.17
org.apache.flink:flink-clients_2.12 1.14.4
org.apache.flink:flink-connector-kafka_2.12 1.14.4
org.apache.flink:flink-hadoop-fs 1.14.4
org.apache.flink:flink-metrics-graphite 1.14.4
org.apache.flink:flink-metrics-prometheus 1.14.4
org.apache.hadoop:hadoop-core 1.2.1
org.apache.kafka:kafka-clients 2.4.1
org.apache.logging.log4j:log4j-api 2.17.1
org.apache.logging.log4j:log4j-core 2.17.1
org.apache.storm:storm-client 2.2.0
org.apache.storm:storm-core 2.2.0
org.apache.storm:storm-kafka-client 2.2.0
org.apache.storm:storm-redis 2.2.0
org.slf4j:slf4j-log4j12 1.7.7

docker/pumbasrvr-client/pom.xml maven

com.fasterxml.jackson.core:jackson-core 2.13.0
com.fasterxml.jackson.dataformat:jackson-dataformat-yaml 2.13.0
com.fasterxml.jackson.datatype:jackson-datatype-jsr310 2.13.0
commons-cli:commons-cli 1.5.0
io.prometheus:simpleclient_httpserver 0.15.0
org.apache.logging.log4j:log4j-api 2.17.1
org.apache.logging.log4j:log4j-core 2.17.1
junit:junit 4.13.1 test

docker/storm-topology/pom.xml maven

commons-cli:commons-cli 1.4 compile
org.apache.storm:storm-client 2.5.0 compile
org.apache.storm:storm-kafka-client 2.5.0 compile
org.apache.storm:storm-redis 2.5.0 compile
org.apache.storm:storm-server 2.5.0 compile
de.tum.in.msrg:datamodel 1.1-SNAPSHOT
org.apache.kafka:kafka-clients 3.6.0
junit:junit 4.13.1 test

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science