revisitnnweightinit

The repository associated with the paper "Revisiting Weight Initialization of Deep Neural Networks".

https://github.com/sandrons/revisitnnweightinit

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

The repository associated with the paper "Revisiting Weight Initialization of Deep Neural Networks".

Basic Info
  • Host: GitHub
  • Owner: sandrons
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 21.5 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

RevisitNNWeightInit

The repository associated with the paper "Revisiting Weight Initialization of Deep Neural Networks" where a new weight-initialization scheme by applying our Hessian chain rule across the (hidden) layers $k=0\ldots n-1$ of a NN is presented. In general, for training NNs, variants of gradient-descent are applied in order to update the model parameters $\mathsf{W}$ iteratively towards the gradient $\textbf{{g}} = D_{\mathsf{W}} L$ of the loss function. In order to quantify this decrease, we need first to consider the second-order Taylor series approximation to the function $f(\textbf{{x}})$ around the current point $\textbf{{x}}^{(0)}$:

$$f(\textbf{{x}}) \approx f(\textbf{{x}}^{(0)}) + (\textbf{{x}} - \textbf{{x}}^{(0)})^T\textbf{{g}} + \frac{1}{2}(\textbf{{x}} - \textbf{{x}}^{(0)})^T \boldsymbol{\mathsf{H}}(\textbf{{x}} - \textbf{{x}}^{(0)}). $$

Substituting this into our approximation, we obtain,

$$ L(\mathsf{W}-\gamma \textbf{{g}}) \approx L(\mathsf{W}) -\gamma\, \textbf{{g}}^{T}\cdot \textbf{{g}} + \frac{\gamma^2}{2}\, \textbf{{g}}^T\cdot \boldsymbol{\mathsf{H}} \cdot \textbf{{g}} $$

We conducted the following experiments to support our theoretical findings. To cover a broad and diverse range of experiments, we trained our models on the MNIST, FashionMNIST, CIFAR10, Google SVHN, and Flowers image datasets.

Implementation

See the notebook for details.

Owner

  • Login: sandrons
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find this repository helpful, please cite it as below."
authors:
  - family-names: "Skorski"
    given-names: "Maciej"
  - family-names: "Temperoni"
    given-names: "Alessandro"
  - family-names: "Theobald"
    given-names: "Martin"
title: "Revisiting Weight Initialization of Deep Neural Networks"
version: 1.0.0
date-released: 2021
url: "https://proceedings.mlr.press/v157/skorski21a/skorski21a.pdf"
preferred-citation:
  type: conference-paper
  authors:
    - family-names: "Skorski"
      given-names: "Maciej"
    - family-names: "Temperoni"
      given-names: "Alessandro"
    - family-names: "Theobald"
      given-names: "Martin"  
  title: "Revisiting Weight Initialization of Deep Neural Networks"
  collection-title: "The 13th Asian Conference on Machine Learning (Conference Track)"
  year: 2021
  url: "https://proceedings.mlr.press/v157/skorski21a/skorski21a.pdf"

GitHub Events

Total
  • Push event: 11
  • Create event: 2
Last Year
  • Push event: 11
  • Create event: 2