revisitnnweightinit
The repository associated with the paper "Revisiting Weight Initialization of Deep Neural Networks".
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.1%) to scientific vocabulary
Repository
The repository associated with the paper "Revisiting Weight Initialization of Deep Neural Networks".
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
RevisitNNWeightInit
The repository associated with the paper "Revisiting Weight Initialization of Deep Neural Networks" where a new weight-initialization scheme by applying our Hessian chain rule across the (hidden) layers $k=0\ldots n-1$ of a NN is presented. In general, for training NNs, variants of gradient-descent are applied in order to update the model parameters $\mathsf{W}$ iteratively towards the gradient $\textbf{{g}} = D_{\mathsf{W}} L$ of the loss function. In order to quantify this decrease, we need first to consider the second-order Taylor series approximation to the function $f(\textbf{{x}})$ around the current point $\textbf{{x}}^{(0)}$:
$$f(\textbf{{x}}) \approx f(\textbf{{x}}^{(0)}) + (\textbf{{x}} - \textbf{{x}}^{(0)})^T\textbf{{g}} + \frac{1}{2}(\textbf{{x}} - \textbf{{x}}^{(0)})^T \boldsymbol{\mathsf{H}}(\textbf{{x}} - \textbf{{x}}^{(0)}). $$
Substituting this into our approximation, we obtain,
$$ L(\mathsf{W}-\gamma \textbf{{g}}) \approx L(\mathsf{W}) -\gamma\, \textbf{{g}}^{T}\cdot \textbf{{g}} + \frac{\gamma^2}{2}\, \textbf{{g}}^T\cdot \boldsymbol{\mathsf{H}} \cdot \textbf{{g}} $$
We conducted the following experiments to support our theoretical findings. To cover a broad and diverse range of experiments, we trained our models on the MNIST, FashionMNIST, CIFAR10, Google SVHN, and Flowers image datasets.
Implementation
See the notebook for details.
Owner
- Login: sandrons
- Kind: user
- Repositories: 1
- Profile: https://github.com/sandrons
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you find this repository helpful, please cite it as below."
authors:
- family-names: "Skorski"
given-names: "Maciej"
- family-names: "Temperoni"
given-names: "Alessandro"
- family-names: "Theobald"
given-names: "Martin"
title: "Revisiting Weight Initialization of Deep Neural Networks"
version: 1.0.0
date-released: 2021
url: "https://proceedings.mlr.press/v157/skorski21a/skorski21a.pdf"
preferred-citation:
type: conference-paper
authors:
- family-names: "Skorski"
given-names: "Maciej"
- family-names: "Temperoni"
given-names: "Alessandro"
- family-names: "Theobald"
given-names: "Martin"
title: "Revisiting Weight Initialization of Deep Neural Networks"
collection-title: "The 13th Asian Conference on Machine Learning (Conference Track)"
year: 2021
url: "https://proceedings.mlr.press/v157/skorski21a/skorski21a.pdf"
GitHub Events
Total
- Push event: 11
- Create event: 2
Last Year
- Push event: 11
- Create event: 2