spark-dynamic-executor-time-prediction

Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.

https://github.com/hinzy97/spark-dynamic-executor-time-prediction

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

apache-spark big-data-analytics deep-learning distributed-computing dynamic-allocation execution-time-prediction machine-learning neural-networks performance-modeling spark
Last synced: 9 months ago · JSON representation ·

Repository

Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
apache-spark big-data-analytics deep-learning distributed-computing dynamic-allocation execution-time-prediction machine-learning neural-networks performance-modeling spark
Created 10 months ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

NN Execution Time Prediction

This repository contains neural network models for predicting execution time of Spark applications, based on the paper:

Tariq, H., & Das, O. (2023). Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors.
Published in: EPEW/ASMTA 2023, Lecture Notes in Computer Science (LNCS).
DOI: https://doi.org/10.1007/978-3-031-43185-2_23


🧠 What It Does

The goal is to accurately predict the total runtime of Spark applications affected by dynamic executor behavior. Two types of neural network models are implemented: - Black-box model: Feature Selection ('Datasize', 'IdleTimeout', 'BacklogTimeout'). - White-box model: Uses detailed stage-level features including task metrics, executor timelines alongwith 'Datasize', 'IdleTimeout', 'BacklogTimeout'.

Workloads include:

  • TPC-DS SQL queries: Q26, Q52, Q70
  • KMeans clustering

Structure

  • km_nn_blackbox.txt: NN model using blackbox features for KMeans.
  • km_nn_whitebox.txt: NN model using whitebox features for KMeans.
  • query26_nn_blackbox.txt: NN model using blackbox features for Query-26.
  • query26_nn_whitebox.txt: NN model using whitebox features for Query-26.
  • q52_NN_black box.ipynb: Blackbox NN model for Query-52
  • q52_NN_whitebox.ipynb: Whitebox NN model for Query-52
  • q70_NN_black box.ipynb: Blackbox NN model for Query-70
  • q70_NN_whitebox.ipynb: Whitebox NN model for Query-70
  • kmeansdata.csv: Input data for KMeans models.
  • query26_train_blackbox.csv: Blackbox feature data for Query-26.
  • query26_train_whitebox.csv: Whitebox feature data for Query-26.
  • query52train.csv: Blackbox feature data for Query-52.
  • query52train1.csv: Whitebox feature data for Query-52.
  • query70train.csv: Blackbox feature data for Query-70.
  • query70train1.csv: Whitebox feature data for Query-70.

How to Run

  1. Open a Jupyter notebook inside the NN folder.
  2. Run the notebook to view predictions and plots.

🔧 Future Work

  • Integration with Spark UI for real-time feature extraction
  • Coupling Dynamic Allocation Model (DAM) with an optimization framework for executor recommendation
  • Extending DAM for multi-job workloads or streaming scenarios

If you use this code or build upon it, please cite the original paper:


📢 Citation

``` @inproceedings{tariq2023execution, title={Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors}, author={Tariq, Hina and Das, Olivia}, booktitle={Computer Performance Engineering (EPEW/ASMTA)}, series={Lecture Notes in Computer Science}, volume={14231}, pages={340--352}, year={2023}, publisher={Springer}, doi={10.1007/978-3-031-43185-2_23} }

```

Owner

  • Login: hinzy97
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this code, please cite the following paper:"
authors:
  - family-names: Tariq
    given-names: Hina
  - family-names: Das
    given-names: Olivia
title: "Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors"
doi: 10.1007/978-3-031-43185-2_23
date-released: 2023-10-01
version: 1.0.0
url: https://github.com/hinzy97/spark-dynamic-executor-time-prediction

GitHub Events

Total
  • Push event: 12
Last Year
  • Push event: 12

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 19
  • Total Committers: 1
  • Avg Commits per committer: 19.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 19
  • Committers: 1
  • Avg Commits per committer: 19.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
hinzy97 8****7 19

Issues and Pull Requests

Last synced: 10 months ago