mlp_hpp_analysis

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

https://github.com/rgtzths/mlp_hpp_analysis

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.4%) to scientific vocabulary

Keywords

deep-neural-networks hyperparameter-importance hyperparameter-optimization hyperparameter-search hyperparameter-tuning
Last synced: 4 months ago · JSON representation ·

Repository

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

Basic Info
  • Host: GitHub
  • Owner: rgtzths
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 58.6 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
deep-neural-networks hyperparameter-importance hyperparameter-optimization hyperparameter-search hyperparameter-tuning
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

mlphppanalysis

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

Before using

Install requirements.txt by using the command pip install -r requirements.txt

To use this module

  1. Write various .json files with the experiments you want to perform.

  2. Run the experiments using the comand python code/run.py --hyper path_to_the_folder

Execute the experiments performed in the paper

In the hyperparameters folder there is one folder for each of the tested datasets.

If the user desires to run every experiment at the same time use the all_runs folder. Otherwise it can run the experiments by folder individually achieving the same results as the ones presented in the paper.

Keep in mind the experiments with the binary_crossentropy and sparse_categorical_crossentropy are kept in a seperate folder as they require Y array to be created differently. You can run them seperatly and then join the csv results.

With the experiments performed the results should be presented in results/raw folder.

To preprocess them run python code/results_preprocess.py which should create the results/final folder with the preprocessed results.

After that to obtain the importance of the hyperparameters run python code/results_analysis.py which should present the importance by dataset and the average of the six datasets.

Results

Here we present the results that are available in the paper and an additional analysis of the obtained results.

If there is any analysis missing that the reader might desire to perform, the complete data obtained from the runs is available in the results folder, or the reader might run the experiments him self.

Hyperparameter importance

These are the results of the fANOVA analysis.

General Importance

| | All Datasets | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 18.42 | 3.2 | 6.99 | | batchsize | 0.95 | 55.94 | 37.67 | | loss | 12.23 | 0.33 | 2.1 | | optimizer | 14.88 | 5.17 | 2.16 | | learningrate | 17.65 | 3.38 | 1.34 | | hiddenlayerdim | 3.94 | 3.85 | 16.62 | | hiddenlayer_size | 3.94 | 3.61 | 6.29 |

Importance by dataset type

| | Classification | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 17.59 | 2.91 | 2.28 | | batchsize | 1.31 | 57.3 | 37.43 | | loss | 9.16 | 0.01 | 3.76 | | optimizer | 17.11 | 3.78 | 4.53 | | learningrate | 21.4 | 4.69 | 0.01 | | hiddenlayerdim | 6.13 | 0.67 | 19.37 | | hiddenlayer_size | 3.04 | 5.2 | 8.54 |

| | Regression | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 23.66 | 2.87 | 15.98 | | batchsize | 4.49 | 64.87 | 37.22 | | loss | 19.51 | 0.12 | 0.01 | | optimizer | 7.4 | 8.33 | 0.12 | | learningrate | 18.09 | 1.38 | 3.26 | | hiddenlayerdim | 2.1 | 2.2 | 12.22 | | hiddenlayer_size | 3.32 | 1.48 | 4.18 |

Importance per dataset

| | Abalone | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 14.77 | 1.39 | 4.39 | | batchsize | 0.55 | 56.72 | 21.61 | | loss | 0.0 | 1.62 | 0.0 | | optimizer | 2.96 | 7.99 | 3.5 | | learningrate | 30.02 | 6.9 | 0.07 | | hiddenlayerdim | 7.16 | 0.12 | 15.69 | | hiddenlayer_size | 11.55 | 4.35 | 11.04 |

| | Bike Sharing | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 51.26 | 0.59 | 24.54 | | batchsize | 0.74 | 72.21 | 29.71 | | loss | 0.06 | 0.0 | 0.0 | | optimizer | 17.86 | 6.28 | 0.02 | | learningrate | 11.6 | 5.17 | 7.14 | | hiddenlayerdim | 0.0 | 1.98 | 14.41 | | hiddenlayer_size | 2.62 | 1.16 | 0.82 |

| | Compas | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 3.4 | 0.4 | 0.08 | | batchsize | 1.16 | 43.0 | 6.23 | | loss | 33.98 | 0.19 | 0.0 | | optimizer | 21.68 | 4.02 | 4.16 | | learningrate | 9.59 | 6.06 | 0.02 | | hiddenlayerdim | 0.76 | 2.92 | 49.31 | | hiddenlayer_size | 3.61 | 7.49 | 20.06 |

| | Covertype | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 29.22 | 12.77 | 4.01 | | batchsize | 0.77 | 56.92 | 41.6 | | loss | 0.06 | 0.0 | 10.34 | | optimizer | 8.29 | 1.65 | 4.67 | | learningrate | 23.64 | 0.32 | 0.17 | | hiddenlayerdim | 13.27 | 0.2 | 3.32 | | hiddenlayer_size | 1.84 | 4.79 | 0.62 |

| | Delays Zurich | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 0.37 | 3.57 | 5.2 | | batchsize | 0.0 | 58.2 | 57.82 | | loss | 39.27 | 0.0 | 0.01 | | optimizer | 14.39 | 2.42 | 0.0 | | learningrate | 0.18 | 0.58 | 0.5 | | hiddenlayerdim | 2.37 | 10.18 | 12.22 | | hiddenlayer_size | 3.81 | 0.48 | 3.92 |

| | Higgs | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 11.51 | 0.49 | 3.73 | | batchsize | 2.46 | 48.6 | 69.07 | | loss | 0.01 | 0.14 | 2.25 | | optimizer | 24.08 | 8.67 | 0.63 | | learningrate | 30.84 | 1.22 | 0.15 | | hiddenlayerdim | 0.09 | 7.68 | 4.75 | | hiddenlayer_size | 0.18 | 3.39 | 1.25 |

Performance metrics

Best performing hyperparameter combination per dataset

| Activation function | Batch size | Hidden layer dimension | Loss function | Optimizer | Learning Rate | MSE/MCC | Training time | Prediction Time| |---|---|---| ---- | --- |---|---|---| ---- | || Regression || | | | | | Abalone | | | | | | relu | 256 | [224, 192, 608, 768, 800] | meansquarederror | adam | 0.001 | 2.158 | 1.928 | 0.107 | | | | | | Bike Sharing | | | | | | selu | 1024 | [352, 32, 288, 32, 544, 704, 96] | meansquarederror | adam |0.001 | 59.748 | 3.621 | 0.128 | | | | | | Delays Zurich | | | | | | relu | 128 | [640, 416, 576, 192, 288, 32, 32] | meansquarederror | adam | 0.001 | 3.101 | 73.694 | 0.286 | || Classification || | | | | | Compass | | | | | | relu | 512 | [512, 512, 512, 512] | categoricalcrossentropy | adam | 0.001 | 0.041 | 1.567 | 0.118 | | | | | | Covertype | | | | | | relu | 512 | [1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024] | categoricalcrossentropy | adam | 0.001 | 0.828 | 74.544 | 0.199 | | | | | | Higgs | | | | | | softsign | 512 | [224, 480, 64, 96, 768, 32, 928] | categorical_crossentropy | adam | 0.001 | 0.415 | 50.935 | 0.239 |

Baseline vs Best vs Worst comparison

The best and worst models were picked based on the performance metric

| Dataset | Baseline | Best model | Worst model | |---|---|---| ---- | | | Performance | (MCC/MSE) | || Regression || |Abalone | 2.289 | 2.158 | 9.295 | |Bike Sharing | 84.045 | 59.748 | 100.139 | | Delays Zurich | 3.107 | 3.101 | 154.627 | || Classification || | Compass | 0.022 | 0.041 | 0 | | Covertype | 0.812 | 0.828 | -0.001 | | Higgs | 0.256 | 0.415 | 0 | | | Training Time | | |Abalone | 1.465 | 1.928 | 2.554 | |Bike Sharing | 4.67 | 3.621 | 3.014 | | Delays Zurich | 12.74 | 73.694 | 7.25 | | Compass | 1.088 | 2.342 | 1.121 | | Covertype | 37.381 | 74.544 | 4.987 | | Higgs | 21.161 | 50.935 | 4.329 | | | Inference Time | | |Abalone | 0.11 | 0.107 | 0.101 | |Bike Sharing | 0.132 | 0.128 | 0.122 | | Delays Zurich | 0.136 | 0.286 | 0.149 | | Compass | 1.088 | 0.11 | 1.121 | | Covertype | 0.173 | 0.199 | 0.172 | | Higgs | 0.173 | 0.239 | 0.182 |

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details

Citation

If you use this code, please cite our work: Teixeira, Rafael & Antunes, Mário & Sobral, Rúben & Martins, João & Gomes, Diogo & Aguiar, Rui. (2023). Exploring the Intricacies of Neural Network Optimization. 10.1007/978-3-031-45275-8_2.

DOI

Owner

  • Name: Rafael Teixeira
  • Login: rgtzths
  • Kind: user
  • Location: Aveiro
  • Company: Instituto de Telecomunicações

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Rafael"
  given-names: "Teixeira"
  orcid: "https://orcid.org/0000-0000-0000-0000"
title: "mlp_hpp_analysis"
version: 1.0.0
doi: 10.1007/978-3-031-45275-8_2
date-released: 2023-12-12
url: "https://github.com/rgtzths/mlp_hpp_analysis"
preferred-citation:
  type: conference-paper
  authors:
  - family-names: "Teixeira"
    given-names: "Rafael"
    orcid: "https://orcid.org/0000-0001-7211-382X"
  - family-names: "Antunes"
    given-names: "Mário"
    orcid: "https://orcid.org/0000-0002-6504-9441"
  - family-names: "Sobral"
    given-names: "Rúben"
    orcid: "https://orcid.org/0009-0001-4357-6582"
  - family-names: "Martins"
    given-names: "João"
    orcid: "https://orcid.org/0009-0008-1193-2483"
  - family-names: "Gomes"
    given-names: "Diogo"
    orcid: "https://orcid.org/0000-0002-5848-2802"
  - family-names: "Aguiar"
    given-names: "Rui L."
    orcid: "https://orcid.org/0000-0003-0107-6253"
  title: "Exploring the Intricacies of Neural Network Optimization"
  doi: 10.1007/978-3-031-45275-8_2
  conference:
    name: "Discovery Science"
    city: "Porto"
    region: "Porto"
    country: "Portugal"
    date-start: 2023-10-09
    date-end: 2023-10-11

GitHub Events

Total
Last Year