mlpgradientflow.jl

Going with the flow of multilayer perceptrons (and finding minima fast and accurately)

https://github.com/jbrea/mlpgradientflow.jl

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary

Keywords from Contributors

numerics

Last synced: 10 months ago · JSON representation ·

Repository

Going with the flow of multilayer perceptrons (and finding minima fast and accurately)

Basic Info

Host: GitHub
Owner: jbrea
License: other
Language: Julia
Default Branch: main
Homepage:
Size: 1.47 MB

Statistics

Stars: 3
Watchers: 3
Forks: 1
Open Issues: 0
Releases: 1

Created over 3 years ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

# MLPGradientFlow [![Documentation](https://img.shields.io/badge/docs-main-blue.svg)](https://jbrea.github.io/MLPGradientFlow.jl/dev) [![build](https://github.com/jbrea/MLPGradientFlow.jl/workflows/CI/badge.svg)](https://github.com/jbrea/MLPGradientFlow.jl/actions?query=workflow%3ACI)

This package allows to investigate the loss landscape and training dynamics of multi-layer perceptrons.

Features

Train multi-layer perceptrons on the CPU to convergence, using first and second order optimization methods.
Fast implementations of gradients and hessians.
Follow gradient flow (using differential equation solvers) or popular (stochastic) gradient descent dynamics (Adam etc.).
Accurate approximations of loss function and its derivatives for infinite normally distributed input data, using Gauss-Hermite quadrature or symbolic integrals.
Utility functions to investigate teacher-student setups and loss landscape visualization.

For more details see the documentation and MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately).

Installation

From Julia

julia using Pkg; Pkg.add(url = "https://github.com/jbrea/MLPGradientFlow.jl.git")

From Python

Install juliacall, e.g. pip install juliacall. python from juliacall import Main as jl jl.seval('using Pkg; Pkg.add(url = "https://github.com/jbrea/MLPGradientFlow.jl.git")')

Usage

From Julia

```julia using MLPGradientFlow

inp = randn(2, 10000) wteach = randn(4, 2) ateach = randn(1, 4) target = ateach * sigmoid.(w_teach * inp)

n = Net(layers = ((5, sigmoid, false), (1, identity, false)), input = inp, target = target)

p = random_params(n)

n(p) # run network on parameters p n(p, randn(2, 100)) # run on parameters p and a random dataset

res = train(n, p, maxtimeode = 2, maxtimeoptim = 2, # maxtime in seconds maxiterations_optim = 10^3, verbosity = 1)

res["x"] # final solution res["ode_x"] # solution after ODE solver

res["loss"] # final loss x = params(res["x"]) loss(n, x) # recompute final loss gradient(n, x) # compute gradient at solution hessian(n, x) # compute hessian at solution hessian_spectrum(n, x) # compute spectrum of the hessian at solution

?MLPGradientFlow.train # look at the doc string of train.

to run multiple initial seeds in parallel

ps = ntuple(_ -> randomparams(n), 4) res = MLPGradientFlow.train(n, ps, maxtimeode = 2, maxtimeoptim = 2, maxiterationsoptim = 10^3, verbosity = 1)

Neural networks without biases and with only a single hidden layer, can also train under the assumption of normally distributed input. For relu and sigmoid2 the implementation uses analytical values for the gaussian integrals (use `f = Val(relu)` for the analytical integration and `f = relu` for the numerical integration). For other activation functions, numerical integration of approximations thereof have to be used.

using LinearAlgebra, ComponentArrays d = 9 inp = randn(d, 10^5) wteach = I(d)[1:d-1,:] ateach = ones(d-1)' target = ateach * relu.(wteach * inp)

n = Net(layers = ((2, relu, false), (1, identity, false)), input = inp, target = target) p = random_params(n)

Train under the assumption of infinite, normally distributed input data

xt = ComponentVector(w1 = wteach, w2 = ateach) ni = NetI(p, xt, Val(relu)) res = train(ni, p, maxiterationsode = 10^3, maxiterationsoptim = 0)

Recommendations for different activation functions:

Using analytical integrals:

- NetI(p, xt, Val(relu))

- NetI(p, xt, Val(sigmoid2))

Using approximations:

- NetI(p, xt, loadpotentialapproximator(softplus))

- NetI(p, xt, loadpotentialapproximator(gelu))

- NetI(p, xt, loadpotentialapproximator(g))

- NetI(p, xt, loadpotentialapproximator(tanh))

- NetI(p, xt, loadpotentialapproximator(sigmoid))

compare the loss computed with finite data to the loss computed with infinite data

loss(n, params(res["init"])) # finite data loss(ni, params(res["init"])) # infinite data loss(n, params(res["x"])) # finite data loss(ni, params(res["x"])) # infinite data

For softplus with approximations

approx = loadpotentialapproximator(softplus)

ni = NetI(p, xt, approx) res = train(ni, p, maxiterationsode = 10^3, maxiterationsoptim = 0)

This result can be fine tuned with numerical integration (WARNING: this is slow!!)

ni2 = NetI(p, xt, softplus) res2 = train(ni2, params(res["x"]), maxtimeode = 60, maxtimeoptim = 60)

```

From Python

```python import numpy as np import juliacall as jc from juliacall import Main as jl

jl.seval('using MLPGradientFlow')

mg = jl.MLPGradientFlow

w = np.random.normal(size = (5, 2))/10 b1 = np.zeros(5) a = np.random.normal(size = (1, 5))/5 b2 = np.zeros(1) inp = np.random.normal(size = (2, 10_000))

wteach = np.random.normal(size = (4, 2)) ateach = np.random.normal(size = (1, 4)) target = np.matmul(ateach, jl.map(mg.sigmoid, np.matmul(wteach, inp)))

mg.train.jlhelp() # look at the docstring

continue as in julia (see above), e.g.

p = mg.params((w, b1), (a, b2))

n = mg.Net(layers = ((5, mg.sigmoid, True), (1, jl.identity, True)), input = inp, target = target)

res = mg.train(n, p, maxtimeode = 2, maxtimeoptim = 2, maxiterations_optim = 10**3, verbosity = 1)

convert the result to a python dictionary with numpy arrays

def convert2py(jldict): d = dict(jldict) for k, v in jldict.items(): if isinstance(v, jc.DictValue): d[k] = convert2py(v) if isinstance(v, jc.ArrayValue): d[k] = v.to_numpy() return d

py_res = convert2py(res)

convert parameters in python format back to julia parameters

p = mg.params(jc.convert(jl.Dict, py_res['x']))

save results in torch.pickle format

mg.pickle("myfilename.pt", res)

mg.hessian_spectrum(n, p) # look at hessian spectrum

an MLP with 2 hidden layers with biases in the second hidden layer

n2 = mg.Net(layers = ((5, mg.sigmoid, False), (4, mg.g, True), (2, mg.identity, False)), input = inp, target = np.random.normal(size = (2, 10_000)))

p2 = mg.params((w, None), (np.random.normal(size = (4, 5)), np.zeros(4)), (np.random.normal(size = (2, 4)), None))

mg.loss(n2, p2) mg.gradient(n2, p2)

for more examples have a look at the julia code above

```

Owner

Login: jbrea
Kind: user

Website: https://people.epfl.ch/johanni.brea
Repositories: 12
Profile: https://github.com/jbrea

Citation (CITATION.bib)

@misc{https://doi.org/10.48550/arxiv.2301.10638,
  doi = {10.48550/ARXIV.2301.10638},
  url = {https://arxiv.org/abs/2301.10638},
  author = {Brea, Johanni and Martinelli, Flavio and Şimşek, Berfin and Gerstner, Wulfram},
  title = {MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)},
  publisher = {arXiv},
  year = {2023},
}

GitHub Events

Total

Push event: 35
Pull request event: 1
Create event: 2

Last Year

Push event: 35
Pull request event: 1
Create event: 2

Committers

Last synced: about 1 year ago

All Time

Total Commits: 112
Total Committers: 3
Avg Commits per committer: 37.333
Development Distribution Score (DDS): 0.116

Past Year

Commits: 48
Committers: 3
Avg Commits per committer: 16.0
Development Distribution Score (DDS): 0.125

Top Committers

Name	Email	Commits
Johanni Brea	j****a	99
CompatHelper Julia	c**y@j**g	7
flavio-martinelli	f**5@g**m	6

Committer Domains (Top 20 + Academic)

julialang.org: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 3
Total pull requests: 12
Average time to close issues: 8 days
Average time to close pull requests: 11 days
Total issue authors: 2
Total pull request authors: 2
Average comments per issue: 0.33
Average comments per pull request: 0.08
Merged pull requests: 9
Bot issues: 0
Bot pull requests: 10

Past Year

Issues: 3
Pull requests: 2
Average time to close issues: 8 days
Average time to close pull requests: about 11 hours
Issue authors: 2
Pull request authors: 2
Average comments per issue: 0.33
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 1

View more stats

Top Authors

Issue Authors

flavio-martinelli (2)
alexl4123 (1)

Pull Request Authors

github-actions[bot] (11)
flavio-martinelli (3)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/CI.yml actions

actions/cache v1 composite
actions/checkout v2 composite
codecov/codecov-action v1 composite
julia-actions/julia-buildpkg v1 composite
julia-actions/julia-processcoverage v1 composite
julia-actions/julia-runtest v1 composite
julia-actions/setup-julia v1 composite

.github/workflows/TagBot.yml actions

JuliaRegistries/TagBot v1 composite

scripts/python_requirements.txt pypi

Pygments ==2.14.0
asttokens ==2.2.1
backcall ==0.2.0
decorator ==5.1.1
executing ==1.2.0
ipython ==8.8.0
jax ==0.4.1
jaxlib ==0.4.1
jedi ==0.18.2
juliacall ==0.9.10
juliapkg ==0.1.9
matplotlib-inline ==0.1.6
numpy ==1.24.1
opt-einsum ==3.3.0
pandas ==1.5.2
parso ==0.8.3
pexpect ==4.8.0
pickleshare ==0.7.5
prompt-toolkit ==3.0.36
ptyprocess ==0.7.0
pure-eval ==0.2.2
python-dateutil ==2.8.2
pytz ==2022.7
scipy ==1.10.0
semantic-version ==2.10.0
six ==1.16.0
stack-data ==0.6.2
torch ==1.13.1
traitlets ==5.8.1
typing_extensions ==4.4.0
wcwidth ==0.2.5

.github/workflows/CompatHelper.yml actions

mlpgradientflow.jl

Science Score: 54.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

Features

Installation

From Julia

From Python

Usage

From Julia

to run multiple initial seeds in parallel

Train under the assumption of infinite, normally distributed input data

Recommendations for different activation functions:

Using analytical integrals:

- NetI(p, xt, Val(relu))

- NetI(p, xt, Val(sigmoid2))

Using approximations:

- NetI(p, xt, loadpotentialapproximator(softplus))

- NetI(p, xt, loadpotentialapproximator(gelu))

- NetI(p, xt, loadpotentialapproximator(g))

- NetI(p, xt, loadpotentialapproximator(tanh))

- NetI(p, xt, loadpotentialapproximator(sigmoid))

compare the loss computed with finite data to the loss computed with infinite data

For softplus with approximations

This result can be fine tuned with numerical integration (WARNING: this is slow!!)

From Python

continue as in julia (see above), e.g.

convert the result to a python dictionary with numpy arrays

convert parameters in python format back to julia parameters

save results in torch.pickle format

an MLP with 2 hidden layers with biases in the second hidden layer

for more examples have a look at the julia code above

Owner

Citation (CITATION.bib)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies