mlp_hpp_analysis
This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary
Keywords
Repository
This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
mlphppanalysis
This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"
Before using
Install requirements.txt by using the command pip install -r requirements.txt
To use this module
Write various
.jsonfiles with the experiments you want to perform.Run the experiments using the comand
python code/run.py --hyper path_to_the_folder
Execute the experiments performed in the paper
In the hyperparameters folder there is one folder for each of the tested datasets.
If the user desires to run every experiment at the same time use the all_runs folder.
Otherwise it can run the experiments by folder individually achieving the same results as the ones presented in the paper.
Keep in mind the experiments with the binary_crossentropy and sparse_categorical_crossentropy are kept in a seperate folder as they require Y array to be created differently.
You can run them seperatly and then join the csv results.
With the experiments performed the results should be presented in results/raw folder.
To preprocess them run python code/results_preprocess.py which should create the results/final folder with the preprocessed results.
After that to obtain the importance of the hyperparameters run python code/results_analysis.py which should present the importance by dataset and the average of the six datasets.
Results
Here we present the results that are available in the paper and an additional analysis of the obtained results.
If there is any analysis missing that the reader might desire to perform, the complete data obtained from the runs is available in the results folder, or the reader might run the experiments him self.
Hyperparameter importance
These are the results of the fANOVA analysis.
General Importance
| | All Datasets | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 18.42 | 3.2 | 6.99 | | batchsize | 0.95 | 55.94 | 37.67 | | loss | 12.23 | 0.33 | 2.1 | | optimizer | 14.88 | 5.17 | 2.16 | | learningrate | 17.65 | 3.38 | 1.34 | | hiddenlayerdim | 3.94 | 3.85 | 16.62 | | hiddenlayer_size | 3.94 | 3.61 | 6.29 |
Importance by dataset type
| | Classification | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 17.59 | 2.91 | 2.28 | | batchsize | 1.31 | 57.3 | 37.43 | | loss | 9.16 | 0.01 | 3.76 | | optimizer | 17.11 | 3.78 | 4.53 | | learningrate | 21.4 | 4.69 | 0.01 | | hiddenlayerdim | 6.13 | 0.67 | 19.37 | | hiddenlayer_size | 3.04 | 5.2 | 8.54 |
| | Regression | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 23.66 | 2.87 | 15.98 | | batchsize | 4.49 | 64.87 | 37.22 | | loss | 19.51 | 0.12 | 0.01 | | optimizer | 7.4 | 8.33 | 0.12 | | learningrate | 18.09 | 1.38 | 3.26 | | hiddenlayerdim | 2.1 | 2.2 | 12.22 | | hiddenlayer_size | 3.32 | 1.48 | 4.18 |
Importance per dataset
| | Abalone | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 14.77 | 1.39 | 4.39 | | batchsize | 0.55 | 56.72 | 21.61 | | loss | 0.0 | 1.62 | 0.0 | | optimizer | 2.96 | 7.99 | 3.5 | | learningrate | 30.02 | 6.9 | 0.07 | | hiddenlayerdim | 7.16 | 0.12 | 15.69 | | hiddenlayer_size | 11.55 | 4.35 | 11.04 |
| | Bike Sharing | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 51.26 | 0.59 | 24.54 | | batchsize | 0.74 | 72.21 | 29.71 | | loss | 0.06 | 0.0 | 0.0 | | optimizer | 17.86 | 6.28 | 0.02 | | learningrate | 11.6 | 5.17 | 7.14 | | hiddenlayerdim | 0.0 | 1.98 | 14.41 | | hiddenlayer_size | 2.62 | 1.16 | 0.82 |
| | Compas | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 3.4 | 0.4 | 0.08 | | batchsize | 1.16 | 43.0 | 6.23 | | loss | 33.98 | 0.19 | 0.0 | | optimizer | 21.68 | 4.02 | 4.16 | | learningrate | 9.59 | 6.06 | 0.02 | | hiddenlayerdim | 0.76 | 2.92 | 49.31 | | hiddenlayer_size | 3.61 | 7.49 | 20.06 |
| | Covertype | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 29.22 | 12.77 | 4.01 | | batchsize | 0.77 | 56.92 | 41.6 | | loss | 0.06 | 0.0 | 10.34 | | optimizer | 8.29 | 1.65 | 4.67 | | learningrate | 23.64 | 0.32 | 0.17 | | hiddenlayerdim | 13.27 | 0.2 | 3.32 | | hiddenlayer_size | 1.84 | 4.79 | 0.62 |
| | Delays Zurich | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 0.37 | 3.57 | 5.2 | | batchsize | 0.0 | 58.2 | 57.82 | | loss | 39.27 | 0.0 | 0.01 | | optimizer | 14.39 | 2.42 | 0.0 | | learningrate | 0.18 | 0.58 | 0.5 | | hiddenlayerdim | 2.37 | 10.18 | 12.22 | | hiddenlayer_size | 3.81 | 0.48 | 3.92 |
| | Higgs | | | |---|---|---| ---- | | Hyperparameter | Performance | Training Time | Inference Time | | activationfunctions | 11.51 | 0.49 | 3.73 | | batchsize | 2.46 | 48.6 | 69.07 | | loss | 0.01 | 0.14 | 2.25 | | optimizer | 24.08 | 8.67 | 0.63 | | learningrate | 30.84 | 1.22 | 0.15 | | hiddenlayerdim | 0.09 | 7.68 | 4.75 | | hiddenlayer_size | 0.18 | 3.39 | 1.25 |
Performance metrics
Best performing hyperparameter combination per dataset
| Activation function | Batch size | Hidden layer dimension | Loss function | Optimizer | Learning Rate | MSE/MCC | Training time | Prediction Time| |---|---|---| ---- | --- |---|---|---| ---- | || Regression || | | | | | Abalone | | | | | | relu | 256 | [224, 192, 608, 768, 800] | meansquarederror | adam | 0.001 | 2.158 | 1.928 | 0.107 | | | | | | Bike Sharing | | | | | | selu | 1024 | [352, 32, 288, 32, 544, 704, 96] | meansquarederror | adam |0.001 | 59.748 | 3.621 | 0.128 | | | | | | Delays Zurich | | | | | | relu | 128 | [640, 416, 576, 192, 288, 32, 32] | meansquarederror | adam | 0.001 | 3.101 | 73.694 | 0.286 | || Classification || | | | | | Compass | | | | | | relu | 512 | [512, 512, 512, 512] | categoricalcrossentropy | adam | 0.001 | 0.041 | 1.567 | 0.118 | | | | | | Covertype | | | | | | relu | 512 | [1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024] | categoricalcrossentropy | adam | 0.001 | 0.828 | 74.544 | 0.199 | | | | | | Higgs | | | | | | softsign | 512 | [224, 480, 64, 96, 768, 32, 928] | categorical_crossentropy | adam | 0.001 | 0.415 | 50.935 | 0.239 |
Baseline vs Best vs Worst comparison
The best and worst models were picked based on the performance metric
| Dataset | Baseline | Best model | Worst model | |---|---|---| ---- | | | Performance | (MCC/MSE) | || Regression || |Abalone | 2.289 | 2.158 | 9.295 | |Bike Sharing | 84.045 | 59.748 | 100.139 | | Delays Zurich | 3.107 | 3.101 | 154.627 | || Classification || | Compass | 0.022 | 0.041 | 0 | | Covertype | 0.812 | 0.828 | -0.001 | | Higgs | 0.256 | 0.415 | 0 | | | Training Time | | |Abalone | 1.465 | 1.928 | 2.554 | |Bike Sharing | 4.67 | 3.621 | 3.014 | | Delays Zurich | 12.74 | 73.694 | 7.25 | | Compass | 1.088 | 2.342 | 1.121 | | Covertype | 37.381 | 74.544 | 4.987 | | Higgs | 21.161 | 50.935 | 4.329 | | | Inference Time | | |Abalone | 0.11 | 0.107 | 0.101 | |Bike Sharing | 0.132 | 0.128 | 0.122 | | Delays Zurich | 0.136 | 0.286 | 0.149 | | Compass | 1.088 | 0.11 | 1.121 | | Covertype | 0.173 | 0.199 | 0.172 | | Higgs | 0.173 | 0.239 | 0.182 |
Authors
- Rafael Teixeira - rgtzths
License
This project is licensed under the MIT License - see the LICENSE file for details
Citation
If you use this code, please cite our work: Teixeira, Rafael & Antunes, Mário & Sobral, Rúben & Martins, João & Gomes, Diogo & Aguiar, Rui. (2023). Exploring the Intricacies of Neural Network Optimization. 10.1007/978-3-031-45275-8_2.
Owner
- Name: Rafael Teixeira
- Login: rgtzths
- Kind: user
- Location: Aveiro
- Company: Instituto de Telecomunicações
- Repositories: 1
- Profile: https://github.com/rgtzths
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Rafael"
given-names: "Teixeira"
orcid: "https://orcid.org/0000-0000-0000-0000"
title: "mlp_hpp_analysis"
version: 1.0.0
doi: 10.1007/978-3-031-45275-8_2
date-released: 2023-12-12
url: "https://github.com/rgtzths/mlp_hpp_analysis"
preferred-citation:
type: conference-paper
authors:
- family-names: "Teixeira"
given-names: "Rafael"
orcid: "https://orcid.org/0000-0001-7211-382X"
- family-names: "Antunes"
given-names: "Mário"
orcid: "https://orcid.org/0000-0002-6504-9441"
- family-names: "Sobral"
given-names: "Rúben"
orcid: "https://orcid.org/0009-0001-4357-6582"
- family-names: "Martins"
given-names: "João"
orcid: "https://orcid.org/0009-0008-1193-2483"
- family-names: "Gomes"
given-names: "Diogo"
orcid: "https://orcid.org/0000-0002-5848-2802"
- family-names: "Aguiar"
given-names: "Rui L."
orcid: "https://orcid.org/0000-0003-0107-6253"
title: "Exploring the Intricacies of Neural Network Optimization"
doi: 10.1007/978-3-031-45275-8_2
conference:
name: "Discovery Science"
city: "Porto"
region: "Porto"
country: "Portugal"
date-start: 2023-10-09
date-end: 2023-10-11