mlpgradientflow.jl
Going with the flow of multilayer perceptrons (and finding minima fast and accurately)
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords from Contributors
Repository
Going with the flow of multilayer perceptrons (and finding minima fast and accurately)
Basic Info
Statistics
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
This package allows to investigate the loss landscape and training dynamics of multi-layer perceptrons.
Features
- Train multi-layer perceptrons on the CPU to convergence, using first and second order optimization methods.
- Fast implementations of gradients and hessians.
- Follow gradient flow (using differential equation solvers) or popular (stochastic) gradient descent dynamics (Adam etc.).
- Accurate approximations of loss function and its derivatives for infinite normally distributed input data, using Gauss-Hermite quadrature or symbolic integrals.
- Utility functions to investigate teacher-student setups and loss landscape visualization.
For more details see the documentation and MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately).
Installation
From Julia
julia
using Pkg; Pkg.add(url = "https://github.com/jbrea/MLPGradientFlow.jl.git")
From Python
Install juliacall, e.g. pip install juliacall.
python
from juliacall import Main as jl
jl.seval('using Pkg; Pkg.add(url = "https://github.com/jbrea/MLPGradientFlow.jl.git")')
Usage
From Julia
```julia using MLPGradientFlow
inp = randn(2, 10000) wteach = randn(4, 2) ateach = randn(1, 4) target = ateach * sigmoid.(w_teach * inp)
n = Net(layers = ((5, sigmoid, false), (1, identity, false)), input = inp, target = target)
p = random_params(n)
n(p) # run network on parameters p n(p, randn(2, 100)) # run on parameters p and a random dataset
res = train(n, p, maxtimeode = 2, maxtimeoptim = 2, # maxtime in seconds maxiterations_optim = 10^3, verbosity = 1)
res["x"] # final solution res["ode_x"] # solution after ODE solver
res["loss"] # final loss x = params(res["x"]) loss(n, x) # recompute final loss gradient(n, x) # compute gradient at solution hessian(n, x) # compute hessian at solution hessian_spectrum(n, x) # compute spectrum of the hessian at solution
?MLPGradientFlow.train # look at the doc string of train.
to run multiple initial seeds in parallel
ps = ntuple(_ -> randomparams(n), 4) res = MLPGradientFlow.train(n, ps, maxtimeode = 2, maxtimeoptim = 2, maxiterationsoptim = 10^3, verbosity = 1)
Neural networks without biases and with only a single hidden layer, can also train under the assumption of normally distributed input. For relu and sigmoid2 the implementation uses analytical values for the gaussian integrals (use f = Val(relu) for the analytical integration and f = relu for the numerical integration). For other activation functions, numerical integration of approximations thereof have to be used.
using LinearAlgebra, ComponentArrays d = 9 inp = randn(d, 10^5) wteach = I(d)[1:d-1,:] ateach = ones(d-1)' target = ateach * relu.(wteach * inp)
n = Net(layers = ((2, relu, false), (1, identity, false)), input = inp, target = target) p = random_params(n)
Train under the assumption of infinite, normally distributed input data
xt = ComponentVector(w1 = wteach, w2 = ateach) ni = NetI(p, xt, Val(relu)) res = train(ni, p, maxiterationsode = 10^3, maxiterationsoptim = 0)
Recommendations for different activation functions:
Using analytical integrals:
- NetI(p, xt, Val(relu))
- NetI(p, xt, Val(sigmoid2))
Using approximations:
- NetI(p, xt, loadpotentialapproximator(softplus))
- NetI(p, xt, loadpotentialapproximator(gelu))
- NetI(p, xt, loadpotentialapproximator(g))
- NetI(p, xt, loadpotentialapproximator(tanh))
- NetI(p, xt, loadpotentialapproximator(sigmoid))
compare the loss computed with finite data to the loss computed with infinite data
loss(n, params(res["init"])) # finite data loss(ni, params(res["init"])) # infinite data loss(n, params(res["x"])) # finite data loss(ni, params(res["x"])) # infinite data
For softplus with approximations
approx = loadpotentialapproximator(softplus)
ni = NetI(p, xt, approx) res = train(ni, p, maxiterationsode = 10^3, maxiterationsoptim = 0)
This result can be fine tuned with numerical integration (WARNING: this is slow!!)
ni2 = NetI(p, xt, softplus) res2 = train(ni2, params(res["x"]), maxtimeode = 60, maxtimeoptim = 60)
```
From Python
```python import numpy as np import juliacall as jc from juliacall import Main as jl
jl.seval('using MLPGradientFlow')
mg = jl.MLPGradientFlow
w = np.random.normal(size = (5, 2))/10 b1 = np.zeros(5) a = np.random.normal(size = (1, 5))/5 b2 = np.zeros(1) inp = np.random.normal(size = (2, 10_000))
wteach = np.random.normal(size = (4, 2)) ateach = np.random.normal(size = (1, 4)) target = np.matmul(ateach, jl.map(mg.sigmoid, np.matmul(wteach, inp)))
mg.train.jlhelp() # look at the docstring
continue as in julia (see above), e.g.
p = mg.params((w, b1), (a, b2))
n = mg.Net(layers = ((5, mg.sigmoid, True), (1, jl.identity, True)), input = inp, target = target)
res = mg.train(n, p, maxtimeode = 2, maxtimeoptim = 2, maxiterations_optim = 10**3, verbosity = 1)
convert the result to a python dictionary with numpy arrays
def convert2py(jldict): d = dict(jldict) for k, v in jldict.items(): if isinstance(v, jc.DictValue): d[k] = convert2py(v) if isinstance(v, jc.ArrayValue): d[k] = v.to_numpy() return d
py_res = convert2py(res)
convert parameters in python format back to julia parameters
p = mg.params(jc.convert(jl.Dict, py_res['x']))
save results in torch.pickle format
mg.pickle("myfilename.pt", res)
mg.hessian_spectrum(n, p) # look at hessian spectrum
an MLP with 2 hidden layers with biases in the second hidden layer
n2 = mg.Net(layers = ((5, mg.sigmoid, False), (4, mg.g, True), (2, mg.identity, False)), input = inp, target = np.random.normal(size = (2, 10_000)))
p2 = mg.params((w, None), (np.random.normal(size = (4, 5)), np.zeros(4)), (np.random.normal(size = (2, 4)), None))
mg.loss(n2, p2) mg.gradient(n2, p2)
for more examples have a look at the julia code above
```
Owner
- Login: jbrea
- Kind: user
- Website: https://people.epfl.ch/johanni.brea
- Repositories: 12
- Profile: https://github.com/jbrea
Citation (CITATION.bib)
@misc{https://doi.org/10.48550/arxiv.2301.10638,
doi = {10.48550/ARXIV.2301.10638},
url = {https://arxiv.org/abs/2301.10638},
author = {Brea, Johanni and Martinelli, Flavio and Şimşek, Berfin and Gerstner, Wulfram},
title = {MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)},
publisher = {arXiv},
year = {2023},
}
GitHub Events
Total
- Push event: 35
- Pull request event: 1
- Create event: 2
Last Year
- Push event: 35
- Pull request event: 1
- Create event: 2
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Johanni Brea | j****a | 99 |
| CompatHelper Julia | c****y@j****g | 7 |
| flavio-martinelli | f****5@g****m | 6 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 3
- Total pull requests: 12
- Average time to close issues: 8 days
- Average time to close pull requests: 11 days
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 0.33
- Average comments per pull request: 0.08
- Merged pull requests: 9
- Bot issues: 0
- Bot pull requests: 10
Past Year
- Issues: 3
- Pull requests: 2
- Average time to close issues: 8 days
- Average time to close pull requests: about 11 hours
- Issue authors: 2
- Pull request authors: 2
- Average comments per issue: 0.33
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- flavio-martinelli (2)
- alexl4123 (1)
Pull Request Authors
- github-actions[bot] (11)
- flavio-martinelli (3)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/cache v1 composite
- actions/checkout v2 composite
- codecov/codecov-action v1 composite
- julia-actions/julia-buildpkg v1 composite
- julia-actions/julia-processcoverage v1 composite
- julia-actions/julia-runtest v1 composite
- julia-actions/setup-julia v1 composite
- JuliaRegistries/TagBot v1 composite
- Pygments ==2.14.0
- asttokens ==2.2.1
- backcall ==0.2.0
- decorator ==5.1.1
- executing ==1.2.0
- ipython ==8.8.0
- jax ==0.4.1
- jaxlib ==0.4.1
- jedi ==0.18.2
- juliacall ==0.9.10
- juliapkg ==0.1.9
- matplotlib-inline ==0.1.6
- numpy ==1.24.1
- opt-einsum ==3.3.0
- pandas ==1.5.2
- parso ==0.8.3
- pexpect ==4.8.0
- pickleshare ==0.7.5
- prompt-toolkit ==3.0.36
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- python-dateutil ==2.8.2
- pytz ==2022.7
- scipy ==1.10.0
- semantic-version ==2.10.0
- six ==1.16.0
- stack-data ==0.6.2
- torch ==1.13.1
- traitlets ==5.8.1
- typing_extensions ==4.4.0
- wcwidth ==0.2.5