spark-dynamic-executor-time-prediction

Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.

https://github.com/hinzy97/spark-dynamic-executor-time-prediction

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

apache-spark big-data-analytics deep-learning distributed-computing dynamic-allocation execution-time-prediction machine-learning neural-networks performance-modeling spark
Last synced: 4 months ago · JSON representation ·

Repository

Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
apache-spark big-data-analytics deep-learning distributed-computing dynamic-allocation execution-time-prediction machine-learning neural-networks performance-modeling spark
Created 5 months ago · Last pushed 5 months ago
Metadata Files
Readme License Citation

README.md

NN Execution Time Prediction

This repository contains neural network models for predicting execution time of Spark applications, based on the paper:

Tariq, H., & Das, O. (2023). Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors.
Published in: EPEW/ASMTA 2023, Lecture Notes in Computer Science (LNCS).
DOI: https://doi.org/10.1007/978-3-031-43185-2_23


🧠 What It Does

The goal is to accurately predict the total runtime of Spark applications affected by dynamic executor behavior. Two types of neural network models are implemented: - Black-box model: Feature Selection ('Datasize', 'IdleTimeout', 'BacklogTimeout'). - White-box model: Uses detailed stage-level features including task metrics, executor timelines alongwith 'Datasize', 'IdleTimeout', 'BacklogTimeout'.

Workloads include:

  • TPC-DS SQL queries: Q26, Q52, Q70
  • KMeans clustering

Structure

  • km_nn_blackbox.txt: NN model using blackbox features for KMeans.
  • km_nn_whitebox.txt: NN model using whitebox features for KMeans.
  • query26_nn_blackbox.txt: NN model using blackbox features for Query-26.
  • query26_nn_whitebox.txt: NN model using whitebox features for Query-26.
  • q52_NN_black box.ipynb: Blackbox NN model for Query-52
  • q52_NN_whitebox.ipynb: Whitebox NN model for Query-52
  • q70_NN_black box.ipynb: Blackbox NN model for Query-70
  • q70_NN_whitebox.ipynb: Whitebox NN model for Query-70
  • kmeansdata.csv: Input data for KMeans models.
  • query26_train_blackbox.csv: Blackbox feature data for Query-26.
  • query26_train_whitebox.csv: Whitebox feature data for Query-26.
  • query52train.csv: Blackbox feature data for Query-52.
  • query52train1.csv: Whitebox feature data for Query-52.
  • query70train.csv: Blackbox feature data for Query-70.
  • query70train1.csv: Whitebox feature data for Query-70.

How to Run

  1. Open a Jupyter notebook inside the NN folder.
  2. Run the notebook to view predictions and plots.

🔧 Future Work

  • Integration with Spark UI for real-time feature extraction
  • Coupling Dynamic Allocation Model (DAM) with an optimization framework for executor recommendation
  • Extending DAM for multi-job workloads or streaming scenarios

If you use this code or build upon it, please cite the original paper:


📢 Citation

``` @inproceedings{tariq2023execution, title={Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors}, author={Tariq, Hina and Das, Olivia}, booktitle={Computer Performance Engineering (EPEW/ASMTA)}, series={Lecture Notes in Computer Science}, volume={14231}, pages={340--352}, year={2023}, publisher={Springer}, doi={10.1007/978-3-031-43185-2_23} }

```

Owner

  • Login: hinzy97
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this code, please cite the following paper:"
authors:
  - family-names: Tariq
    given-names: Hina
  - family-names: Das
    given-names: Olivia
title: "Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors"
doi: 10.1007/978-3-031-43185-2_23
date-released: 2023-10-01
version: 1.0.0
url: https://github.com/hinzy97/spark-dynamic-executor-time-prediction

GitHub Events

Total
  • Push event: 12
Last Year
  • Push event: 12

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 19
  • Total Committers: 1
  • Avg Commits per committer: 19.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 19
  • Committers: 1
  • Avg Commits per committer: 19.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
hinzy97 8****7 19

Issues and Pull Requests

Last synced: 5 months ago