EnergySamplers

A small Julia package for energy-based sampling.

https://github.com/juliatrustworthyai/energysamplers.jl

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary

Last synced: 7 months ago · JSON representation ·

Repository

A small Julia package for energy-based sampling.

Basic Info

Host: GitHub
Owner: JuliaTrustworthyAI
License: mit
Language: TeX
Default Branch: main
Homepage: https://www.taija.org/EnergySamplers.jl/
Size: 2.12 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 4
Releases: 4

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License Citation

EnergySamplers

EnergySamplers.jl is a small and lightweight package for sampling from probability distributions using methods from energy-based modelling (EBM). Its functionality is used in other Taija packages, including JointEnergyModels.jl and CounterfactualExplanations.jl.

Extensions to Optimisers.jl

The package adds two new optimisers that are compatible with the Optimisers.jl interface:

Stochastic Gradient Langevin Dynamics (SGLD) (Welling and Teh 2011) — SGLD.
Improper SGLD (see, for example, Grathwohl et al. (2020)) — ImproperSGLD.

SGLD is an efficient gradient-based Markov Chain Monte Carlo (MCMC) method that can be used in the context of EBM to draw samples from the model posterior (Murphy 2023). Formally, we can draw from $p_{\theta}(x)$ as follows

math \begin{aligned} x_{j+1} &\leftarrow x_j - \frac{\epsilon_j^2}{2} \nabla_x \mathcal{E}_{\theta}(x_j) + \epsilon_j r_j, && j=1,...,J \end{aligned}

where $rj \sim \mathcal{N}(0,I)$ is a stochastic term and the step-size $\epsilonj$ is typically polynomially decayed (Welling and Teh 2011). To allow for faster sampling, it is common practice to choose the step-size $\epsilonj$ and the standard deviation of $rj$ separately. While $xJ$ is only guaranteed to distribute as $p{\theta}(x)$ if $\epsilon \rightarrow 0$ and $J \rightarrow \infty$, the bias introduced for a small finite $\epsilon$ is negligible in practice (Murphy 2023). We denote this form of sampling as Improper SGLD.

Example: Bayesian Inferecne with SGLD

To illustrate how the custom optimisers can be used, we will go through an example adapted from this (great!) blog post by Sebastian Callh. First, let’s load some dependencies:

``` julia

External dependencies:

using Flux using Flux: gpu using MLDataUtils: shuffleobs, stratifiedobs, rescale! using Plots using Random using RDatasets using Statistics

Custom optimisers:

using EnergySamplers: ImproperSGLD, SGLD ```

Next, we load some data and prepare it for training a logistic regression model in Flux.jl:

``` julia Random.seed!(2024)

data = dataset("ISLR", "Default") todigit(x) = x == "Yes" ? 1.0 : 0.0 data[!, :Default] = map(todigit, data[:, :Default]) data[!, :Student] = map(todigit, data[:, :Student])

target = :Default numerics = [:Balance, :Income] features = [:Student, :Balance, :Income] train, test = (d -> stratifiedobs(first, d; p=0.7))(shuffleobs(data))

for feature in numerics μ, σ = rescale!(train[!, feature]; obsdim=1) rescale!(test[!, feature], μ, σ; obsdim=1) end

prepX(x) = gpu(Matrix(x)') prepy(y) = gpu(reshape(y, 1, :)) trainX, testX = prepX.((train[:, features], test[:, features])) trainy, testy = prepy.((train[:, target], test[:, target])) trainset = Flux.DataLoader((trainX, train_y); batchsize=100, shuffle=false) ```

┌ Info: The CUDA functionality is being called but
│ `CUDA.jl` must be loaded to access it.
└ Add `using CUDA` or `import CUDA` to your code.  Alternatively, configure a different GPU backend by calling `Flux.gpu_backend!`.

70-element DataLoader(::Tuple{LinearAlgebra.Adjoint{Float64, Matrix{Float64}}, Matrix{Float64}}, batchsize=100)
  with first element:
  (3×100 Matrix{Float64}, 1×100 Matrix{Float64},)

Finally, we create a small helper function that runs the training loop for a given optimiser opt and number of steps:

``` julia function train_logreg(; steps::Int=1000, opt=Flux.Descent(2)) Random.seed!(1)

paramvec(θ) = reduce(hcat, cpu(θ))
model = gpu(Dense(length(features), 1, sigmoid))
θ = Flux.params(model)
θ₀ = paramvec(θ)

predict(x; thres=0.5) = model(x) .> thres
accuracy(x, y) = mean(cpu(predict(x)) .== cpu(y))

loss(yhat, y) = Flux.binarycrossentropy(yhat, y)
avg_loss(yhat, y) = mean(loss(yhat, y))
trainloss() = avg_loss(model(train_X), train_y)
testloss() = avg_loss(model(test_X), test_y)

trainlosses = [cpu(trainloss()); zeros(steps)]
testlosses = [cpu(testloss()); zeros(steps)]
weights = [cpu(θ₀); zeros(steps, length(θ₀))]

opt_state = Flux.setup(opt, model)

for t in 1:steps
    for data in train_set
        input, label = data

        # Calculate the gradient of the objective
        # with respect to the parameters within the model:
        grads = Flux.gradient(model) do m
            result = m(input)
            loss(result, label)
        end

        Flux.update!(opt_state, model, grads[1])
    end

    # Bookkeeping
    weights[t + 1, :] = cpu(paramvec(θ))
    trainlosses[t + 1] = cpu(trainloss())
    testlosses[t + 1] = cpu(testloss())
end

println("Final parameters are $(paramvec(θ))")
println("Test accuracy is $(accuracy(test_X, test_y))")

return model, weights, trainlosses, testlosses

end ```

train_logreg (generic function with 1 method)

Now we use this function to train the model, first using SGLD and then using Improper SGLD:

``` julia results = trainlogreg(; steps=100, opt=SGLD(10.0, 10.0, 0.9)) model, weights, trainlosses, testlosses = results p1 = plot(weights; label=["Student" "Balance" "Income" "Intercept"], plottitle="SGLD")

results = trainlogreg(; steps=100, opt=ImproperSGLD(2.0, 0.01)) model, weights, trainlosses, testlosses = results p2 = plot(weights; label=["Student" "Balance" "Income" "Intercept"], plottitle="Improper SGLD")

plot(p1, p2, size=(800, 400)) ```

┌ Info: The CUDA functionality is being called but
│ `CUDA.jl` must be loaded to access it.
└ Add `using CUDA` or `import CUDA` to your code.  Alternatively, configure a different GPU backend by calling `Flux.gpu_backend!`.
┌ Warning: Layer with Float32 parameters got Float64 input.
│   The input will be converted, but any earlier layers may be very slow.
│   layer = Dense(3 => 1, σ)    # 4 parameters
│   summary(x) = "3×7000 adjoint(::Matrix{Float64}) with eltype Float64"
└ @ Flux ~/.julia/packages/Flux/HBF2N/src/layers/stateless.jl:60
Final parameters are Float32[-2.3311744 1.1305944 -1.5102222 -4.0762844]
Test accuracy is 0.9666666666666667
Final parameters are Float32[-0.6106307 2.760134 -0.031244753 -5.8856964]
Test accuracy is 0.9763333333333334

Energy-Based Samplers

In the context of EBM, the optimisers can be used to sample from a model posterior. To this end, the package provides the following samples:

UnconditionalSampler — samples from the unconditional distribution $p_{\theta}(x)$ as in Grathwohl et al. (2020).
ConditionalSampler — samples from the conditional distribution $p_{\theta}(x|y)$ as in Grathwohl et al. (2020).
JointSampler — samples from the joint distribution $p_{\theta}(x,y)$ as in Kelly, Zemel, and Grathwohl (2021).

Example: Joint Energy-Based Model

The conditional sampler is used to draw class-conditional samples from a joint energy-based model (JEM) trained using Taija’s JointEnergyModels.jl. JEMs are explicitly trained to not only discriminate between output classes but also generate inputs. Hence, in the image below we can see that the model’s posterior conditional distributions (both over outputs and inputs) seem to approximate the true underlying distributions reasonably well: the model has learned to discriminate between the two classes (as indicated by the contours) and to generate samples from each class (as indicated by the stars).

Worked Example

Next, we will present a simple worked example involving linearly separable Gaussian blobs:

``` julia using Distributions using MLJBase

Data:

nobs = 2000 X, y = makeblobs(nobs; centers=2, centerbox=(-2. => 2.), clusterstd=0.1) Xmat = Float32.(permutedims(matrix(X))) X = table(permutedims(Xmat)) batchsize = Int(round(nobs / 10))

Distributions:

𝒟x = Normal() 𝒟y = Categorical(ones(2) ./ 2) ```

Distributions.Categorical{Float64, Vector{Float64}}(support=Base.OneTo(2), p=[0.5, 0.5])

We train a simple linear classifier to discriminate between output classes:

``` julia

Train a simple neural network on the data (classification)

Xtrain = permutedims(MLJBase.matrix(X)) ytrain = Flux.onehotbatch(y, levels(y)) trainset = zip(eachcol(Xtrain), eachcol(ytrain)) inputdim = size(first(trainset)[1], 1) outputdim = size(first(trainset)[2], 1) nn = Chain(Dense(inputdim, outputdim)) loss(yhat, y) = Flux.logitcrossentropy(yhat, y) optstate = Flux.setup(Flux.Adam(), nn) epochs = 5 for epoch in 1:epochs Flux.train!(nn, trainset, optstate) do m, x, y loss(m(x), y) end @info "Epoch $epoch" println("Accuracy: ", mean(Flux.onecold(nn(Xtrain)) .== Flux.onecold(ytrain))) end ```

[ Info: Epoch 1
Accuracy: 0.919
[ Info: Epoch 2
Accuracy: 0.997
[ Info: Epoch 3
Accuracy: 0.9995
[ Info: Epoch 4
Accuracy: 0.9995
[ Info: Epoch 5
Accuracy: 0.9995

Finally, we draw conditional samples from the model. Since we used a purely discriminative model for the task, the estimated posterior conditional distributions over inputs are not very good: the conditionally drawn samples (Xhat) largely lie on the right side of the decision boundary, but they are clearly not generated by the same data generating process as the training data.

``` julia using EnergySamplers: ConditionalSampler, PMC

PMC

bs = 10 ntrans = 10 niter = 100

Conditionally sample from first class:

smpler = ConditionalSampler( 𝒟x, 𝒟y; inputsize=size(Xmat)[1:(end - 1)], batchsize=bs ) x1 = PMC(smpler, nn, ImproperSGLD(); ntransitions=ntrans, niter=niter, y=1)

Conditionally sample from second class:

smpler = ConditionalSampler( 𝒟x, 𝒟y; inputsize=size(Xmat)[1:(end - 1)], batchsize=bs ) x2 = PMC(smpler, nn, ImproperSGLD(); ntransitions=ntrans, niter=niter, y=2)

Contour plot for predictions:

xlims = extrema(hcat(x1,x2)[1,:]) .* 1.1 ylims = extrema(hcat(x1,x2)[2,:]) .* 1.1 xrange = range(xlims[1], xlims[2], 100) yrange = range(ylims[1], ylims[2], 100) z = [softmax(nn([x, y])) for x in xrange, y in yrange] |> z -> reduce(hcat, z) plt = contourf(xrange, yrange, z[1,:], lw=0.1, xlims=xlims, ylims=ylims)

Plot samples:

scatter!(Xtrain[1, :], Xtrain[2, :], color=Int.(y.refs), group=Int.(y.refs), label=["X|y=0" "X|y=1"], ms=2, markerstrokecolor=Int.(y.refs)) scatter!(x1[1, :], x1[2, :], color=1, label="Xhat|y=0", ms=4, alpha=0.5) scatter!(x2[1, :], x2[2, :], color=2, label="Xhat|y=1", ms=4, alpha=0.5) plot(plt) ```

┌ Warning: Layer with Float32 parameters got Float64 input.
│   The input will be converted, but any earlier layers may be very slow.
│   layer = Dense(2 => 2)       # 6 parameters
│   summary(x) = "2-element Vector{Float64}"
└ @ Flux ~/.julia/packages/Flux/HBF2N/src/layers/stateless.jl:60

References

Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2020. “Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One.” In International Conference on Learning Representations.

Kelly, Jacob, Richard Zemel, and Will Grathwohl. 2021. “Directly Training Joint Energy-Based Models for Conditional Synthesis and Calibrated Prediction of Multi-Attribute Data.” https://arxiv.org/abs/2108.04227.

Murphy, Kevin P. 2023. Probabilistic Machine Learning: Advanced Topics. MIT press.

Welling, Max, and Yee W Teh. 2011. “Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 681–88. Citeseer.

Owner

Name: Taija
Login: JuliaTrustworthyAI
Kind: organization
Location: Netherlands

Repositories: 2
Profile: https://github.com/JuliaTrustworthyAI

Home for repositories of the Taija (Trustworthy Artifical Intelligence in Julia) project.

Citation (CITATION.bib)

@misc{EnergySamplers.jl,
	author  = {Patrick Altmeyer and contributors},
	title   = {EnergySamplers.jl},
	url     = {https://github.com/JuliaTrustworthyAI/EnergySamplers.jl},
	version = {v1.0.0-DEV},
	year    = {2024},
	month   = {7}
}

GitHub Events

Total

Create event: 9
Commit comment event: 6
Release event: 2
Delete event: 1
Issue comment event: 11
Push event: 19
Pull request event: 11

Last Year

Create event: 9
Commit comment event: 6
Release event: 2
Delete event: 1
Issue comment event: 11
Push event: 19
Pull request event: 11

Committers

Last synced: 10 months ago

All Time

Total Commits: 37
Total Committers: 1
Avg Commits per committer: 37.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 37
Committers: 1
Avg Commits per committer: 37.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
pat-alt	a**t@g**m	37

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 1
Total pull requests: 11
Average time to close issues: less than a minute
Average time to close pull requests: 13 days
Total issue authors: 1
Total pull request authors: 3
Average comments per issue: 4.0
Average comments per pull request: 1.0
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 6

Past Year

Issues: 1
Pull requests: 11
Average time to close issues: less than a minute
Average time to close pull requests: 13 days
Issue authors: 1
Pull request authors: 3
Average comments per issue: 4.0
Average comments per pull request: 1.0
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 6

View more stats

Top Authors

Issue Authors

JuliaTagBot (1)

Pull Request Authors

pat-alt (10)
github-actions[bot] (6)
dependabot[bot] (5)

Top Labels

Issue Labels

Pull Request Labels

dependencies (5)

Packages

Total packages: 1
Total downloads:
- julia 2 total

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4

juliahub.com: EnergySamplers

A small Julia package for energy-based sampling.

Homepage: https://www.taija.org/EnergySamplers.jl/
Documentation: https://docs.juliahub.com/General/EnergySamplers/stable/
License: MIT
Latest release: 1.0.3
published over 1 year ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 2 Total

Rankings

Dependent repos count: 3.2%

Downloads: 3.8%

Average: 7.8%

Dependent packages count: 16.3%

Last synced: 8 months ago

Dependencies

.github/workflows/CI.yml actions

actions/cache v3 composite
actions/checkout v4 composite
codecov/codecov-action v4 composite
julia-actions/cache v2 composite
julia-actions/julia-buildpkg v1 composite
julia-actions/julia-processcoverage v1 composite
julia-actions/julia-runtest v1 composite
julia-actions/setup-julia latest composite
julia-actions/setup-julia v2 composite
quarto-dev/quarto-actions/setup v2 composite

.github/workflows/CompatHelper.yml actions

.github/workflows/Register.yml actions

julia-actions/RegisterAction latest composite

.github/workflows/TagBot.yml actions

JuliaRegistries/TagBot v1 composite

EnergySamplers

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

EnergySamplers

Extensions to Optimisers.jl

Example: Bayesian Inferecne with SGLD

External dependencies:

Custom optimisers:

Energy-Based Samplers

Example: Joint Energy-Based Model

Worked Example

Data:

Distributions:

Train a simple neural network on the data (classification)

PMC

Conditionally sample from first class:

Conditionally sample from second class:

Contour plot for predictions:

Plot samples:

References

Owner

Citation (CITATION.bib)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

juliahub.com: EnergySamplers

Rankings

Dependencies