Hyperopt

Hyperparameter optimization in Julia.

https://github.com/baggepinnen/hyperopt.jl

Keywords

bayesian-optimization bohb global-optimization hyperband hyperparameter-optimization optimization parameter-tuning random-sampling

Keywords from Contributors

julialang mixed-model particle-filter error-analysis notes quantizer nonlinear-programming abstract-algebra computer-algebra julia-package

Last synced: 6 months ago · JSON representation ·

Repository

Hyperparameter optimization in Julia.

Basic Info

Host: GitHub
Owner: baggepinnen
License: other
Language: Julia
Default Branch: master
Homepage:
Size: 213 KB

Statistics

Stars: 202
Watchers: 6
Forks: 19
Open Issues: 17
Releases: 27

Topics

bayesian-optimization bohb global-optimization hyperband hyperparameter-optimization optimization parameter-tuning random-sampling

Created over 7 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

Hyperopt

Maintanence mode

⚠️ This package has outgrown its original design and has in the process become hard to maintain and extend. This package is therefore in maintainence mode, and is unlikely to accept suggestions for significant continued development. Smaller bug-fix PRs are still welcome.

A package to perform hyperparameter optimization. Currently supports random search, latin hypercube sampling and Bayesian optimization.

Usage

This package was designed to facilitate the addition of optimization logic to already existing code. I usually write some code and try a few hyper parameters by hand before I realize I have to take a more structured approach to finding good hyper parameters. I therefore designed this package such that the optimization logic is wrapped around existing code, and the user only has to specify which variables to optimize and candidate values (ranges) for these variables.

High-level example

In order to add hyper-parameter optimization to the existing pseudo code julia a = manually_selected_value b = other_value cost = train_model(a,b) we wrap it in @hyperopt like this julia ho = @hyperopt for i = number_of_samples, a = candidate_values, b = other_candidate_values cost = train_model(a,b) end

Details

The macro @hyperopt takes a for-loop with an initial argument determining the number of samples to draw (i below).
The sample strategy can be specified by specifying the special keyword sampler = Sampler(opts...). Available options are RandomSampler(), LHSampler(), CLHSampler(dims=[Continuous(), Categorical(2), Continuous(), ...]), Hyperband(R=50, η=3, inner=RandomSampler()).
The subsequent arguments to the for-loop specifies names and candidate values for different hyper parameters (a = LinRange(1,2,1000), b = [true, false], c = exp10.(LinRange(-1,3,1000)) below).
A useful strategy to achieve log-uniform sampling is logarithmically spaced vector, e.g. c = exp10.(LinRange(-1,3,1000)).
In the example below, the parameters i,a,b,c can be used within the expression sent to the macro and they will hold a new value sampled from the corresponding candidate vector each iteration.

The resulting object ho::Hyperoptimizer holds all the sampled parameters and function values and has minimum/minimizer and maximum/maximizer properties (e.g., ho.minimizer). It can also be plotted using plot(ho) (uses Plots.jl). The exact syntax to use for various samplers is shown in the testfile, which should be fairly readable.

Full example

```julia using Hyperopt

f(x,a,b=true;c=10) = sum(@. x + (a-3)^2 + (b ? 10 : 20) + (c-100)^2) # Function to minimize

Main macro. The first argument to the for loop is always interpreted as the number of iterations (except for hyperband optimizer)

ho = @hyperopt for i=50, sampler = RandomSampler(), # This is default if none provided a = LinRange(1,5,1000), b = [true, false], c = exp10.(LinRange(-1,3,1000)) print(i, "\t", a, "\t", b, "\t", c, " \t") x = 100 @show f(x,a,b,c=c) end 1 3.910910910910911 false 0.15282140360258697 f(x, a, b, c=c) = 10090.288832348499 2 3.930930930930931 true 6.1629662551329405 f(x, a, b, c=c) = 8916.255534433481 3 2.7617617617617616 true 146.94918006248173 f(x, a, b, c=c) = 2314.282265997491 4 3.6666666666666665 false 0.3165924111983522 f(x, a, b, c=c) = 10057.226192959602 5 4.783783783783784 true 34.55719936762139 f(x, a, b, c=c) = 4395.942039196544 6 2.5895895895895897 true 4.985373463873895 f(x, a, b, c=c) = 9137.947692504491 7 1.6206206206206206 false 301.6334347259197 f(x, a, b, c=c) = 40777.94468684398 8 1.012012012012012 true 33.00034791125285 f(x, a, b, c=c) = 4602.905476253546 9 3.3583583583583585 true 193.7703337477989 f(x, a, b, c=c) = 8903.003911886599 10 4.903903903903904 true 144.26439512181574 f(x, a, b, c=c) = 2072.9615255755252 11 2.2332332332332334 false 119.97177354358843 f(x, a, b, c=c) = 519.4596697509966 12 2.369369369369369 false 117.77987011971193 f(x, a, b, c=c) = 436.52147646611473 13 3.2182182182182184 false 105.44427935261685 f(x, a, b, c=c) = 149.68779686009242 ⋮

Hyperopt.Hyperoptimizer iterations: Int64 50 params: Tuple{Symbol,Symbol,Symbol} candidates: Array{AbstractArray{T,1} where T}((3,)) history: Array{Any}((50,)) results: Array{Any}((50,)) sampler: Hyperopt.RandomSampler

julia> bestparams, minf = ho.minimizer, ho.minimum (Real[1.62062, true, 100.694], 112.38413353985818)

julia> printmin(ho) a = 1.62062 b = true c = 100.694 ```

We can also visualize the result by plotting the hyperoptimizer plot(ho) window

This may allow us to determine which parameters are most important for the performance etc.

The type Hyperoptimizer is iterable, it iterates for the specified number of iterations, each iteration providing a sample of the parameter vector, e.g. ```julia ho = Hyperoptimizer(10, a = LinRange(1,2,50), b = [true, false], c = randn(100)) for (i,a,b,c) in ho println(i, "\t", a, "\t", b, "\t", c) end

1 1.2244897959183674 false 0.8179751164732062 2 1.7142857142857142 true 0.6536272580487854 3 1.4285714285714286 true -0.2737451706680355 4 1.6734693877551021 false -0.12313108128547606 5 1.9795918367346939 false -0.4350837079334295 6 1.0612244897959184 true -0.2025613848798039 7 1.469387755102041 false 0.7464858339748051 8 1.8571428571428572 true -0.9269021128132274 9 1.163265306122449 true 2.6554272337516966 10 1.4081632653061225 true 1.112896676939024 ```

If used in this way, the hyperoptimizer can not keep track of the function values like it did when @hyperopt was used. To manually store the same data, consider a pattern like julia ho = Hyperoptimizer(10, a = LinRange(1,2), b = [true, false], c = randn(100)) for (i,a,b,c) in ho res = computations(a,b,c) push!(ho.history, [a,b,c]) end

Categorical variables

RandomSampler and CLHSampler support categorical variables which do not have a natural floating point representation, such as functions: ```julia @hyperopt for i=20, fun = [tanh, σ, relu] train_network(fun) end

or

@hyperopt for i=20, sampler=CLHSampler(dims=[Categorical(3), Continuous()]), fun = [tanh, σ, relu], param = LinRange(0,1,20) train_network(fun, param) end ```

Which sampler to use?

RandomSampler is a good baseline and the default if none is chosen. Hyperband(R=50, η=3, inner=RandomSampler()) runs the expression with varying amount of resources, allocating more resources to promising hyperparameters. See below for more info on Hyperband.

If number of iterations is small, LHSampler work better than random search. Caveat: LHSampler needs all candidate vectors to be of equal length, i.e., julia hob = @hyperopt for i=100, sampler = LHSampler(), a = LinRange(1,5,100), b = repeat([true, false],50), c = exp10.(LinRange(-1,3,100)) f(a,b,c=c) end where all candidate vectors are of length 100. The candidates for b thus had to be repeated 50 times.

The categorical CLHSampler circumvents this julia hob = @hyperopt for i=100, sampler=CLHSampler(dims=[Continuous(), Categorical(2), Continuous()]), a = LinRange(1,5,100), b = [true, false], c = exp10.(LinRange(-1,3,100)) f(a,b,c=c) end

Hyperband

Hyperband(R=50, η=3, inner=RandomSampler()) Implements Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. The maximum amount of resources is given by R and the parameter η roughly determines the proportion of trials discarded between each round of successive halving. When using Hyperband the expression inside the @hyperopt macro takes the form of the following pseudocode julia ho = @hyperopt for resources=50, sampler=Hyperband(R=50, η=3, inner=RandomSampler()), a = LinRange(1,5,1800), c = exp10.(LinRange(-1,3,1800)) if state === nothing # Query if state is initialized res = optimize(resources, a, b) # if state is uninitialized, start a new optimization using the selected hyper parameters else res = optimize(resources, state=state) # If state has a value, continue the optimization from the state end minimum(res), get_state(res) # return the minimum value and a state from which to continue the optimization end the resources are increased by defining a variable resources inside each loop, which grows according to the hyperband algorithm. How to interpret resources is entirely up to the user - it can be a time limit, the maximum number of iterations, or anything else.

A (simple) working example using Hyperband and Optim is given below, where the resources are used to control the maximum calls to the objective function: julia using Optim f(a;c=10) = sum(@. 100 + (a-3)^2 + (c-100)^2) hohb = @hyperopt for resources=50, sampler=Hyperband(R=50, η=3, inner=RandomSampler()), a = LinRange(1,5,1800), c = exp10.(LinRange(-1,3,1800)) if !(state === nothing) a,c = state end res = Optim.optimize(x->f(x[1],c=x[2]), [a,c], SimulatedAnnealing(), Optim.Options(f_calls_limit=round(Int, resources))) Optim.minimum(res), Optim.minimizer(res) end plot(hohb) and a more complicated example that also explores different Optim optimizers as the inner optimizer is julia hohb = @hyperopt for resources=50, sampler=Hyperband(R=50, η=3, inner=RandomSampler()), algorithm = [SimulatedAnnealing(), ParticleSwarm(), NelderMead(), BFGS(), NewtonTrustRegion()], a = LinRange(1,5,1800), c = exp10.(LinRange(-1,3,1800)) if state !== nothing algorithm, x0 = state else x0 = [a,c] end println(resources, " algorithm: ", typeof(algorithm).name.name) res = Optim.optimize(x->f(x[1],c=x[2]), x0, algorithm, Optim.Options(time_limit=resources+1, show_trace=false)) Optim.minimum(res), (algorithm, Optim.minimizer(res)) end

Function / vector interface

Hyperband can also be called by itself with a more standard optimizer interface. In this case, the objective function takes a scalar resources and a vector of parameters, and returns the objective value and a vector of parameters.

Example: ```julia using Hyperopt using Optim: optimize, Options, minimum, minimizer f(a;c=10) = sum(@. 100 + (a-3)^2 + (c-100)^2)

objective = function (resources::Real, pars::AbstractVector) res = optimize(x->f(x[1],c=x[2]), pars, SimulatedAnnealing(), Options(time_limit=resources/100)) minimum(res), minimizer(res) end

candidates = (a=LinRange(1,5,300), c=exp10.(LinRange(-1,3,300))) # A vector of vectors also works, but parameters will not get nice names in plots hohb = hyperband(objective, candidates; R=50, η=3, threads=true) ```

BOHB

BOHB: Robust and Efficient Hyperparameter Optimization at Scale refines Hyperband by replacing the random sampler by a bayesian-optimization-based sampler. Now you can use it by simply replace the sampler in Hyperband as BOHB(dims=[<dims>...])

Examples

Below in an example without BOHB, which should be familiar from previous examples: julia using Optim hb = @hyperopt for i=18, sampler=Hyperband(R=50, η=3, inner=RandomSampler()), a = LinRange(1,5,800), c = exp10.(LinRange(-1,3,1800)) if state !== nothing a,c = state end res = Optim.optimize(x->f(x[1],c=x[2]), [a,c], NelderMead(), Optim.Options(f_calls_limit=round(Int, i))) Optim.minimum(res), Optim.minimizer(res) end

To use BOHB, simply replace the inner sampler. Here we change from RandomSampler to BOHB.
Remember to specify dimension types for BOHB! julia bohb = @hyperopt for i=18, sampler=Hyperband(R=50, η=3, inner=BOHB(dims=[Hyperopt.Continuous(), Hyperopt.Continuous()])), a = LinRange(1,5,800), c = exp10.(LinRange(-1,3,1800)) if state !== nothing a,c = state end res = Optim.optimize(x->f(x[1],c=x[2]), [a,c], NelderMead(), Optim.Options(f_calls_limit=round(Int, i))) Optim.minimum(res), Optim.minimizer(res) end

When using BOHB, a Kernel Density Estimator will estimate hyperparameters that balance exploration and exploitation based on previous observations. It does this by setting the state variable to a tuple of the estimated set of hyperparameters, at certain points within the loop. As a consequence, the state returned (Optim.minimizer(res) in the previous example) needs to be a tuple holding the values for each hyperparameter in order ((a, c) in the previous example).

Note - BOHB currently only handles Continuous variables, see issue #80 for a discussion on adding support for categorical variables.

Parallel execution

The macro @phyperopt works in the same way as @hyperopt but distributes all computation on available workers. The usual caveats apply, code must be loaded on all workers etc.
- @phyperopt accepts an optional second argument which is a pmap-like function. E.g. (args...,) -> pmap(args...; on_error=...).
The macro @thyperopt uses ThreadPools.tmap to evaluate the objective on all available threads. Beware of high memory consumption if your objective allocates a lot of memory.

Owner

Name: Fredrik Bagge Carlson
Login: baggepinnen
Kind: user
Location: Lund, Sweden

Website: baggepinnen.github.io
Twitter: baggepinnen
Repositories: 59
Profile: https://github.com/baggepinnen

Control systems, system identification, signal processing and machine learning

Citation (CITATION.bib)

@article{bagge2018hyperopt,
  title     = {Hyperopt. jl: Hyperparameter optimization in Julia.},
  author    = {Bagge Carlson, Fredrik},
  year      = {2018},
  publisher = {github},
  url       = {https://lup.lub.lu.se/search/publication/6ec19989-9b30-448c-be5e-bae4c4257c7b}
}

GitHub Events

Total

Watch event: 4
Issue comment event: 1
Pull request event: 2
Fork event: 1

Last Year

Watch event: 4
Issue comment event: 1
Pull request event: 2
Fork event: 1

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 161
Total Committers: 14
Avg Commits per committer: 11.5
Development Distribution Score (DDS): 0.484

Top Committers

Name	Email	Commits
Fredrik Bagge Carlson	b**n@g**m	83
Fredrik Bagge Carlson	c**b@u**g	47
github-actions[bot]	4**]@u**m	9
Eric Hanson	5**n@u**m	6
Albin Heimerson	a**m@h**m	3
norci	n**i@u**m	2
noil-reed	p**2@g**m	2
KronosTheLate	6**e@u**m	2
Albin Heimerson	a**n@c**e	2
Simeon Schaub	s**9@g**m	1
femtocleaner[bot]	f**]@u**m	1
Julia TagBot	5**t@u**m	1
samuela	s**h@g**m	1
Will Pazner	w**p@g**m	1

Committer Domains (Top 20 + Academic)

control.lth.se: 1 ulund.org: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 50
Total pull requests: 49
Average time to close issues: 4 months
Average time to close pull requests: 18 days
Total issue authors: 28
Total pull request authors: 14
Average comments per issue: 3.92
Average comments per pull request: 1.1
Merged pull requests: 31
Bot issues: 0
Bot pull requests: 26

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 1 minute
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 1.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

KronosTheLate (10)
luboshanus (5)
mchitre (3)
samuela (3)
oxinabox (3)
baggepinnen (2)
00sapo (2)
SobhanMP (2)
rachellb (1)
JuliaTagBot (1)
ParadaCarleton (1)
patwa67 (1)
PGimenez (1)
ericphanson (1)
torfjelde (1)

Pull Request Authors

github-actions[bot] (25)
albheim (5)
baggepinnen (4)
ericphanson (3)
KronosTheLate (2)
ddavid (2)
norci (2)
pazner (1)
femtocleaner[bot] (1)
simeonschaub (1)
noilreed (1)
samuela (1)
ubornheimer (1)
JuliaTagBot (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- julia 20 total

Total dependent packages: 2
Total dependent repositories: 0
Total versions: 27

juliahub.com: Hyperopt

Hyperparameter optimization in Julia.

Documentation: https://docs.juliahub.com/General/Hyperopt/stable/
License: MIT
Latest release: 0.5.6
published about 3 years ago

Versions: 27
Dependent Packages: 2
Dependent Repositories: 0
Downloads: 20 Total

Rankings

Stargazers count: 4.2%

Forks count: 7.5%

Average: 9.6%

Dependent repos count: 9.9%

Dependent packages count: 16.6%

Last synced: 6 months ago

Hyperopt

Science Score: 28.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Hyperopt

Maintanence mode

Usage

High-level example

Details

Full example

Main macro. The first argument to the for loop is always interpreted as the number of iterations (except for hyperband optimizer)

Categorical variables

or

Which sampler to use?

Hyperband

Function / vector interface

BOHB

Examples

Parallel execution

Owner

Citation (CITATION.bib)

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

juliahub.com: Hyperopt

Rankings

Dependencies