revisitnnweightinit

The repository associated with the paper "Revisiting Weight Initialization of Deep Neural Networks".

https://github.com/sandrons/revisitnnweightinit

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

The repository associated with the paper "Revisiting Weight Initialization of Deep Neural Networks".

Basic Info

Host: GitHub
Owner: sandrons
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 21.5 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

RevisitNNWeightInit

The repository associated with the paper "Revisiting Weight Initialization of Deep Neural Networks" where a new weight-initialization scheme by applying our Hessian chain rule across the (hidden) layers $k=0\ldots n-1$ of a NN is presented. In general, for training NNs, variants of gradient-descent are applied in order to update the model parameters $\mathsf{W}$ iteratively towards the gradient $\textbf{{g}} = D_{\mathsf{W}} L$ of the loss function. In order to quantify this decrease, we need first to consider the second-order Taylor series approximation to the function $f(\textbf{{x}})$ around the current point $\textbf{{x}}^{(0)}$:

$$f(\textbf{{x}}) \approx f(\textbf{{x}}^{(0)}) + (\textbf{{x}} - \textbf{{x}}^{(0)})^T\textbf{{g}} + \frac{1}{2}(\textbf{{x}} - \textbf{{x}}^{(0)})^T \boldsymbol{\mathsf{H}}(\textbf{{x}} - \textbf{{x}}^{(0)}). $$

Substituting this into our approximation, we obtain,

$$ L(\mathsf{W}-\gamma \textbf{{g}}) \approx L(\mathsf{W}) -\gamma\, \textbf{{g}}^{T}\cdot \textbf{{g}} + \frac{\gamma^2}{2}\, \textbf{{g}}^T\cdot \boldsymbol{\mathsf{H}} \cdot \textbf{{g}} $$

We conducted the following experiments to support our theoretical findings. To cover a broad and diverse range of experiments, we trained our models on the MNIST, FashionMNIST, CIFAR10, Google SVHN, and Flowers image datasets.

Implementation

See the notebook for details.

Owner

Login: sandrons
Kind: user

Repositories: 1
Profile: https://github.com/sandrons

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find this repository helpful, please cite it as below."
authors:
  - family-names: "Skorski"
    given-names: "Maciej"
  - family-names: "Temperoni"
    given-names: "Alessandro"
  - family-names: "Theobald"
    given-names: "Martin"
title: "Revisiting Weight Initialization of Deep Neural Networks"
version: 1.0.0
date-released: 2021
url: "https://proceedings.mlr.press/v157/skorski21a/skorski21a.pdf"
preferred-citation:
  type: conference-paper
  authors:
    - family-names: "Skorski"
      given-names: "Maciej"
    - family-names: "Temperoni"
      given-names: "Alessandro"
    - family-names: "Theobald"
      given-names: "Martin"  
  title: "Revisiting Weight Initialization of Deep Neural Networks"
  collection-title: "The 13th Asian Conference on Machine Learning (Conference Track)"
  year: 2021
  url: "https://proceedings.mlr.press/v157/skorski21a/skorski21a.pdf"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science