spark-dynamic-executor-time-prediction
Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.
https://github.com/hinzy97/spark-dynamic-executor-time-prediction
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Keywords
Repository
Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.
Basic Info
- Host: GitHub
- Owner: hinzy97
- License: other
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://www.researchgate.net/publication/381108033_Execution_Time_Prediction_Model_that_Considers_Dynamic_Allocation_of_Spark_Executors#fullTextFileContent
- Size: 146 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
NN Execution Time Prediction
This repository contains neural network models for predicting execution time of Spark applications, based on the paper:
Tariq, H., & Das, O. (2023). Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors.
Published in: EPEW/ASMTA 2023, Lecture Notes in Computer Science (LNCS).
DOI: https://doi.org/10.1007/978-3-031-43185-2_23
🧠 What It Does
The goal is to accurately predict the total runtime of Spark applications affected by dynamic executor behavior. Two types of neural network models are implemented: - Black-box model: Feature Selection ('Datasize', 'IdleTimeout', 'BacklogTimeout'). - White-box model: Uses detailed stage-level features including task metrics, executor timelines alongwith 'Datasize', 'IdleTimeout', 'BacklogTimeout'.
Workloads include:
- TPC-DS SQL queries: Q26, Q52, Q70
- KMeans clustering
Structure
km_nn_blackbox.txt: NN model using blackbox features for KMeans.km_nn_whitebox.txt: NN model using whitebox features for KMeans.query26_nn_blackbox.txt: NN model using blackbox features for Query-26.query26_nn_whitebox.txt: NN model using whitebox features for Query-26.q52_NN_black box.ipynb: Blackbox NN model for Query-52q52_NN_whitebox.ipynb: Whitebox NN model for Query-52q70_NN_black box.ipynb: Blackbox NN model for Query-70q70_NN_whitebox.ipynb: Whitebox NN model for Query-70kmeansdata.csv: Input data for KMeans models.query26_train_blackbox.csv: Blackbox feature data for Query-26.query26_train_whitebox.csv: Whitebox feature data for Query-26.query52train.csv: Blackbox feature data for Query-52.query52train1.csv: Whitebox feature data for Query-52.query70train.csv: Blackbox feature data for Query-70.query70train1.csv: Whitebox feature data for Query-70.
How to Run
- Open a Jupyter notebook inside the
NNfolder. - Run the notebook to view predictions and plots.
🔧 Future Work
- Integration with Spark UI for real-time feature extraction
- Coupling Dynamic Allocation Model (DAM) with an optimization framework for executor recommendation
- Extending DAM for multi-job workloads or streaming scenarios
If you use this code or build upon it, please cite the original paper:
📢 Citation
``` @inproceedings{tariq2023execution, title={Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors}, author={Tariq, Hina and Das, Olivia}, booktitle={Computer Performance Engineering (EPEW/ASMTA)}, series={Lecture Notes in Computer Science}, volume={14231}, pages={340--352}, year={2023}, publisher={Springer}, doi={10.1007/978-3-031-43185-2_23} }
```
Owner
- Login: hinzy97
- Kind: user
- Repositories: 1
- Profile: https://github.com/hinzy97
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this code, please cite the following paper:"
authors:
- family-names: Tariq
given-names: Hina
- family-names: Das
given-names: Olivia
title: "Execution Time Prediction Model that Considers Dynamic Allocation of Spark Executors"
doi: 10.1007/978-3-031-43185-2_23
date-released: 2023-10-01
version: 1.0.0
url: https://github.com/hinzy97/spark-dynamic-executor-time-prediction
GitHub Events
Total
- Push event: 12
Last Year
- Push event: 12
Issues and Pull Requests
Last synced: 5 months ago