mlpgradientflow.jl

Going with the flow of multilayer perceptrons (and finding minima fast and accurately)

https://github.com/jbrea/mlpgradientflow.jl

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary

Keywords from Contributors

numerics
Last synced: 7 months ago · JSON representation ·

Repository

Going with the flow of multilayer perceptrons (and finding minima fast and accurately)

Basic Info
  • Host: GitHub
  • Owner: jbrea
  • License: other
  • Language: Julia
  • Default Branch: main
  • Homepage:
  • Size: 1.47 MB
Statistics
  • Stars: 3
  • Watchers: 3
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Created about 3 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

# MLPGradientFlow [![Documentation](https://img.shields.io/badge/docs-main-blue.svg)](https://jbrea.github.io/MLPGradientFlow.jl/dev) [![build](https://github.com/jbrea/MLPGradientFlow.jl/workflows/CI/badge.svg)](https://github.com/jbrea/MLPGradientFlow.jl/actions?query=workflow%3ACI)

This package allows to investigate the loss landscape and training dynamics of multi-layer perceptrons.

Features

  • Train multi-layer perceptrons on the CPU to convergence, using first and second order optimization methods.
  • Fast implementations of gradients and hessians.
  • Follow gradient flow (using differential equation solvers) or popular (stochastic) gradient descent dynamics (Adam etc.).
  • Accurate approximations of loss function and its derivatives for infinite normally distributed input data, using Gauss-Hermite quadrature or symbolic integrals.
  • Utility functions to investigate teacher-student setups and loss landscape visualization.

For more details see the documentation and MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately).

Installation

From Julia

julia using Pkg; Pkg.add(url = "https://github.com/jbrea/MLPGradientFlow.jl.git")

From Python

Install juliacall, e.g. pip install juliacall. python from juliacall import Main as jl jl.seval('using Pkg; Pkg.add(url = "https://github.com/jbrea/MLPGradientFlow.jl.git")')

Usage

From Julia

```julia using MLPGradientFlow

inp = randn(2, 10000) wteach = randn(4, 2) ateach = randn(1, 4) target = ateach * sigmoid.(w_teach * inp)

n = Net(layers = ((5, sigmoid, false), (1, identity, false)), input = inp, target = target)

p = random_params(n)

n(p) # run network on parameters p n(p, randn(2, 100)) # run on parameters p and a random dataset

res = train(n, p, maxtimeode = 2, maxtimeoptim = 2, # maxtime in seconds maxiterations_optim = 10^3, verbosity = 1)

res["x"] # final solution res["ode_x"] # solution after ODE solver

res["loss"] # final loss x = params(res["x"]) loss(n, x) # recompute final loss gradient(n, x) # compute gradient at solution hessian(n, x) # compute hessian at solution hessian_spectrum(n, x) # compute spectrum of the hessian at solution

?MLPGradientFlow.train # look at the doc string of train.

to run multiple initial seeds in parallel

ps = ntuple(_ -> randomparams(n), 4) res = MLPGradientFlow.train(n, ps, maxtimeode = 2, maxtimeoptim = 2, maxiterationsoptim = 10^3, verbosity = 1)

Neural networks without biases and with only a single hidden layer, can also train under the assumption of normally distributed input. For relu and sigmoid2 the implementation uses analytical values for the gaussian integrals (use f = Val(relu) for the analytical integration and f = relu for the numerical integration). For other activation functions, numerical integration of approximations thereof have to be used.

using LinearAlgebra, ComponentArrays d = 9 inp = randn(d, 10^5) wteach = I(d)[1:d-1,:] ateach = ones(d-1)' target = ateach * relu.(wteach * inp)

n = Net(layers = ((2, relu, false), (1, identity, false)), input = inp, target = target) p = random_params(n)

Train under the assumption of infinite, normally distributed input data

xt = ComponentVector(w1 = wteach, w2 = ateach) ni = NetI(p, xt, Val(relu)) res = train(ni, p, maxiterationsode = 10^3, maxiterationsoptim = 0)

Recommendations for different activation functions:

Using analytical integrals:

- NetI(p, xt, Val(relu))

- NetI(p, xt, Val(sigmoid2))

Using approximations:

- NetI(p, xt, loadpotentialapproximator(softplus))

- NetI(p, xt, loadpotentialapproximator(gelu))

- NetI(p, xt, loadpotentialapproximator(g))

- NetI(p, xt, loadpotentialapproximator(tanh))

- NetI(p, xt, loadpotentialapproximator(sigmoid))

compare the loss computed with finite data to the loss computed with infinite data

loss(n, params(res["init"])) # finite data loss(ni, params(res["init"])) # infinite data loss(n, params(res["x"])) # finite data loss(ni, params(res["x"])) # infinite data

For softplus with approximations

approx = loadpotentialapproximator(softplus)

ni = NetI(p, xt, approx) res = train(ni, p, maxiterationsode = 10^3, maxiterationsoptim = 0)

This result can be fine tuned with numerical integration (WARNING: this is slow!!)

ni2 = NetI(p, xt, softplus) res2 = train(ni2, params(res["x"]), maxtimeode = 60, maxtimeoptim = 60)

```

From Python

```python import numpy as np import juliacall as jc from juliacall import Main as jl

jl.seval('using MLPGradientFlow')

mg = jl.MLPGradientFlow

w = np.random.normal(size = (5, 2))/10 b1 = np.zeros(5) a = np.random.normal(size = (1, 5))/5 b2 = np.zeros(1) inp = np.random.normal(size = (2, 10_000))

wteach = np.random.normal(size = (4, 2)) ateach = np.random.normal(size = (1, 4)) target = np.matmul(ateach, jl.map(mg.sigmoid, np.matmul(wteach, inp)))

mg.train.jlhelp() # look at the docstring

continue as in julia (see above), e.g.

p = mg.params((w, b1), (a, b2))

n = mg.Net(layers = ((5, mg.sigmoid, True), (1, jl.identity, True)), input = inp, target = target)

res = mg.train(n, p, maxtimeode = 2, maxtimeoptim = 2, maxiterations_optim = 10**3, verbosity = 1)

convert the result to a python dictionary with numpy arrays

def convert2py(jldict): d = dict(jldict) for k, v in jldict.items(): if isinstance(v, jc.DictValue): d[k] = convert2py(v) if isinstance(v, jc.ArrayValue): d[k] = v.to_numpy() return d

py_res = convert2py(res)

convert parameters in python format back to julia parameters

p = mg.params(jc.convert(jl.Dict, py_res['x']))

save results in torch.pickle format

mg.pickle("myfilename.pt", res)

mg.hessian_spectrum(n, p) # look at hessian spectrum

an MLP with 2 hidden layers with biases in the second hidden layer

n2 = mg.Net(layers = ((5, mg.sigmoid, False), (4, mg.g, True), (2, mg.identity, False)), input = inp, target = np.random.normal(size = (2, 10_000)))

p2 = mg.params((w, None), (np.random.normal(size = (4, 5)), np.zeros(4)), (np.random.normal(size = (2, 4)), None))

mg.loss(n2, p2) mg.gradient(n2, p2)

for more examples have a look at the julia code above

```

Owner

  • Login: jbrea
  • Kind: user

Citation (CITATION.bib)

@misc{https://doi.org/10.48550/arxiv.2301.10638,
  doi = {10.48550/ARXIV.2301.10638},
  url = {https://arxiv.org/abs/2301.10638},
  author = {Brea, Johanni and Martinelli, Flavio and Şimşek, Berfin and Gerstner, Wulfram},
  title = {MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)},
  publisher = {arXiv},
  year = {2023},
}

GitHub Events

Total
  • Push event: 35
  • Pull request event: 1
  • Create event: 2
Last Year
  • Push event: 35
  • Pull request event: 1
  • Create event: 2

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 112
  • Total Committers: 3
  • Avg Commits per committer: 37.333
  • Development Distribution Score (DDS): 0.116
Past Year
  • Commits: 48
  • Committers: 3
  • Avg Commits per committer: 16.0
  • Development Distribution Score (DDS): 0.125
Top Committers
Name Email Commits
Johanni Brea j****a 99
CompatHelper Julia c****y@j****g 7
flavio-martinelli f****5@g****m 6
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 3
  • Total pull requests: 12
  • Average time to close issues: 8 days
  • Average time to close pull requests: 11 days
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 0.33
  • Average comments per pull request: 0.08
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 10
Past Year
  • Issues: 3
  • Pull requests: 2
  • Average time to close issues: 8 days
  • Average time to close pull requests: about 11 hours
  • Issue authors: 2
  • Pull request authors: 2
  • Average comments per issue: 0.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • flavio-martinelli (2)
  • alexl4123 (1)
Pull Request Authors
  • github-actions[bot] (11)
  • flavio-martinelli (3)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/CI.yml actions
  • actions/cache v1 composite
  • actions/checkout v2 composite
  • codecov/codecov-action v1 composite
  • julia-actions/julia-buildpkg v1 composite
  • julia-actions/julia-processcoverage v1 composite
  • julia-actions/julia-runtest v1 composite
  • julia-actions/setup-julia v1 composite
.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite
scripts/python_requirements.txt pypi
  • Pygments ==2.14.0
  • asttokens ==2.2.1
  • backcall ==0.2.0
  • decorator ==5.1.1
  • executing ==1.2.0
  • ipython ==8.8.0
  • jax ==0.4.1
  • jaxlib ==0.4.1
  • jedi ==0.18.2
  • juliacall ==0.9.10
  • juliapkg ==0.1.9
  • matplotlib-inline ==0.1.6
  • numpy ==1.24.1
  • opt-einsum ==3.3.0
  • pandas ==1.5.2
  • parso ==0.8.3
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • prompt-toolkit ==3.0.36
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • python-dateutil ==2.8.2
  • pytz ==2022.7
  • scipy ==1.10.0
  • semantic-version ==2.10.0
  • six ==1.16.0
  • stack-data ==0.6.2
  • torch ==1.13.1
  • traitlets ==5.8.1
  • typing_extensions ==4.4.0
  • wcwidth ==0.2.5
.github/workflows/CompatHelper.yml actions