mlp_hpp_analysis

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

https://github.com/rgtzths/mlp_hpp_analysis

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary

Keywords

deep-neural-networks hyperparameter-importance hyperparameter-optimization hyperparameter-search hyperparameter-tuning

Last synced: 10 months ago · JSON representation ·

Repository

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

Basic Info

Host: GitHub
Owner: rgtzths
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 58.6 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

deep-neural-networks hyperparameter-importance hyperparameter-optimization hyperparameter-search hyperparameter-tuning

Created almost 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

mlphppanalysis

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

Before using

Install requirements.txt by using the command pip install -r requirements.txt

To use this module

Write various .json files with the experiments you want to perform.
Run the experiments using the comand python code/run.py --hyper path_to_the_folder

Execute the experiments performed in the paper

In the hyperparameters folder there is one folder for each of the tested datasets.

If the user desires to run every experiment at the same time use the all_runs folder. Otherwise it can run the experiments by folder individually achieving the same results as the ones presented in the paper.

Keep in mind the experiments with the binary_crossentropy and sparse_categorical_crossentropy are kept in a seperate folder as they require Y array to be created differently. You can run them seperatly and then join the csv results.

With the experiments performed the results should be presented in results/raw folder.

To preprocess them run python code/results_preprocess.py which should create the results/final folder with the preprocessed results.

After that to obtain the importance of the hyperparameters run python code/results_analysis.py which should present the importance by dataset and the average of the six datasets.

Results

Here we present the results that are available in the paper and an additional analysis of the obtained results.

If there is any analysis missing that the reader might desire to perform, the complete data obtained from the runs is available in the results folder, or the reader might run the experiments him self.

Hyperparameter importance

These are the results of the fANOVA analysis.

General Importance

| | All Datasets | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 18.42 | 3.2 | 6.99 | | batchsize | 0.95 | 55.94 | 37.67 | | loss | 12.23 | 0.33 | 2.1 | | optimizer | 14.88 | 5.17 | 2.16 | | learningrate | 17.65 | 3.38 | 1.34 | | hiddenlayerdim | 3.94 | 3.85 | 16.62 | | hiddenlayer_size | 3.94 | 3.61 | 6.29 |

Importance by dataset type

| | Classification | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 17.59 | 2.91 | 2.28 | | batchsize | 1.31 | 57.3 | 37.43 | | loss | 9.16 | 0.01 | 3.76 | | optimizer | 17.11 | 3.78 | 4.53 | | learningrate | 21.4 | 4.69 | 0.01 | | hiddenlayerdim | 6.13 | 0.67 | 19.37 | | hiddenlayer_size | 3.04 | 5.2 | 8.54 |

| | Regression | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 23.66 | 2.87 | 15.98 | | batchsize | 4.49 | 64.87 | 37.22 | | loss | 19.51 | 0.12 | 0.01 | | optimizer | 7.4 | 8.33 | 0.12 | | learningrate | 18.09 | 1.38 | 3.26 | | hiddenlayerdim | 2.1 | 2.2 | 12.22 | | hiddenlayer_size | 3.32 | 1.48 | 4.18 |

Importance per dataset

| | Abalone | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 14.77 | 1.39 | 4.39 | | batchsize | 0.55 | 56.72 | 21.61 | | loss | 0.0 | 1.62 | 0.0 | | optimizer | 2.96 | 7.99 | 3.5 | | learningrate | 30.02 | 6.9 | 0.07 | | hiddenlayerdim | 7.16 | 0.12 | 15.69 | | hiddenlayer_size | 11.55 | 4.35 | 11.04 |

| | Bike Sharing | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 51.26 | 0.59 | 24.54 | | batchsize | 0.74 | 72.21 | 29.71 | | loss | 0.06 | 0.0 | 0.0 | | optimizer | 17.86 | 6.28 | 0.02 | | learningrate | 11.6 | 5.17 | 7.14 | | hiddenlayerdim | 0.0 | 1.98 | 14.41 | | hiddenlayer_size | 2.62 | 1.16 | 0.82 |

| | Compas | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 3.4 | 0.4 | 0.08 | | batchsize | 1.16 | 43.0 | 6.23 | | loss | 33.98 | 0.19 | 0.0 | | optimizer | 21.68 | 4.02 | 4.16 | | learningrate | 9.59 | 6.06 | 0.02 | | hiddenlayerdim | 0.76 | 2.92 | 49.31 | | hiddenlayer_size | 3.61 | 7.49 | 20.06 |

| | Covertype | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 29.22 | 12.77 | 4.01 | | batchsize | 0.77 | 56.92 | 41.6 | | loss | 0.06 | 0.0 | 10.34 | | optimizer | 8.29 | 1.65 | 4.67 | | learningrate | 23.64 | 0.32 | 0.17 | | hiddenlayerdim | 13.27 | 0.2 | 3.32 | | hiddenlayer_size | 1.84 | 4.79 | 0.62 |

| | Delays Zurich | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 0.37 | 3.57 | 5.2 | | batchsize | 0.0 | 58.2 | 57.82 | | loss | 39.27 | 0.0 | 0.01 | | optimizer | 14.39 | 2.42 | 0.0 | | learningrate | 0.18 | 0.58 | 0.5 | | hiddenlayerdim | 2.37 | 10.18 | 12.22 | | hiddenlayer_size | 3.81 | 0.48 | 3.92 |

| | Higgs | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 11.51 | 0.49 | 3.73 | | batchsize | 2.46 | 48.6 | 69.07 | | loss | 0.01 | 0.14 | 2.25 | | optimizer | 24.08 | 8.67 | 0.63 | | learningrate | 30.84 | 1.22 | 0.15 | | hiddenlayerdim | 0.09 | 7.68 | 4.75 | | hiddenlayer_size | 0.18 | 3.39 | 1.25 |

Performance metrics

Best performing hyperparameter combination per dataset

| Activation function | Batch size | Hidden layer dimension | Loss function | Optimizer | Learning Rate | MSE/MCC | Training time | Prediction Time| |---|---|---| ---- | --- |---|---|---| ---- | || Regression || | | | | | Abalone | | | | | | relu | 256 | [224, 192, 608, 768, 800] | meansquarederror | adam | 0.001 | 2.158 | 1.928 | 0.107 | | | | | | Bike Sharing | | | | | | selu | 1024 | [352, 32, 288, 32, 544, 704, 96] | meansquarederror | adam |0.001 | 59.748 | 3.621 | 0.128 | | | | | | Delays Zurich | | | | | | relu | 128 | [640, 416, 576, 192, 288, 32, 32] | meansquarederror | adam | 0.001 | 3.101 | 73.694 | 0.286 | || Classification || | | | | | Compass | | | | | | relu | 512 | [512, 512, 512, 512] | categoricalcrossentropy | adam | 0.001 | 0.041 | 1.567 | 0.118 | | | | | | Covertype | | | | | | relu | 512 | [1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024] | categoricalcrossentropy | adam | 0.001 | 0.828 | 74.544 | 0.199 | | | | | | Higgs | | | | | | softsign | 512 | [224, 480, 64, 96, 768, 32, 928] | categorical_crossentropy | adam | 0.001 | 0.415 | 50.935 | 0.239 |

Baseline vs Best vs Worst comparison

The best and worst models were picked based on the performance metric

| Dataset | Baseline | Best model | Worst model | |---|---|---| ---- | | | Performance | (MCC/MSE) | || Regression || |Abalone | 2.289 | 2.158 | 9.295 | |Bike Sharing | 84.045 | 59.748 | 100.139 | | Delays Zurich | 3.107 | 3.101 | 154.627 | || Classification || | Compass | 0.022 | 0.041 | 0 | | Covertype | 0.812 | 0.828 | -0.001 | | Higgs | 0.256 | 0.415 | 0 | | | Training Time | | |Abalone | 1.465 | 1.928 | 2.554 | |Bike Sharing | 4.67 | 3.621 | 3.014 | | Delays Zurich | 12.74 | 73.694 | 7.25 | | Compass | 1.088 | 2.342 | 1.121 | | Covertype | 37.381 | 74.544 | 4.987 | | Higgs | 21.161 | 50.935 | 4.329 | | | Inference Time | | |Abalone | 0.11 | 0.107 | 0.101 | |Bike Sharing | 0.132 | 0.128 | 0.122 | | Delays Zurich | 0.136 | 0.286 | 0.149 | | Compass | 1.088 | 0.11 | 1.121 | | Covertype | 0.173 | 0.199 | 0.172 | | Higgs | 0.173 | 0.239 | 0.182 |

Authors

Rafael Teixeira - rgtzths

License

This project is licensed under the MIT License - see the LICENSE file for details

Citation

If you use this code, please cite our work: Teixeira, Rafael & Antunes, Mário & Sobral, Rúben & Martins, João & Gomes, Diogo & Aguiar, Rui. (2023). Exploring the Intricacies of Neural Network Optimization. 10.1007/978-3-031-45275-8_2.

DOI

Owner

Name: Rafael Teixeira
Login: rgtzths
Kind: user
Location: Aveiro
Company: Instituto de Telecomunicações

Repositories: 1
Profile: https://github.com/rgtzths

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Rafael"
  given-names: "Teixeira"
  orcid: "https://orcid.org/0000-0000-0000-0000"
title: "mlp_hpp_analysis"
version: 1.0.0
doi: 10.1007/978-3-031-45275-8_2
date-released: 2023-12-12
url: "https://github.com/rgtzths/mlp_hpp_analysis"
preferred-citation:
  type: conference-paper
  authors:
  - family-names: "Teixeira"
    given-names: "Rafael"
    orcid: "https://orcid.org/0000-0001-7211-382X"
  - family-names: "Antunes"
    given-names: "Mário"
    orcid: "https://orcid.org/0000-0002-6504-9441"
  - family-names: "Sobral"
    given-names: "Rúben"
    orcid: "https://orcid.org/0009-0001-4357-6582"
  - family-names: "Martins"
    given-names: "João"
    orcid: "https://orcid.org/0009-0008-1193-2483"
  - family-names: "Gomes"
    given-names: "Diogo"
    orcid: "https://orcid.org/0000-0002-5848-2802"
  - family-names: "Aguiar"
    given-names: "Rui L."
    orcid: "https://orcid.org/0000-0003-0107-6253"
  title: "Exploring the Intricacies of Neural Network Optimization"
  doi: 10.1007/978-3-031-45275-8_2
  conference:
    name: "Discovery Science"
    city: "Porto"
    region: "Porto"
    country: "Portugal"
    date-start: 2023-10-09
    date-end: 2023-10-11

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

mlp_hpp_analysis

Science Score: 57.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

mlphppanalysis

Before using

To use this module

Execute the experiments performed in the paper

Results

Hyperparameter importance

General Importance

Importance by dataset type

Importance per dataset

Performance metrics

Best performing hyperparameter combination per dataset

Baseline vs Best vs Worst comparison

Authors

License

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year