approxposterior

approxposterior: Approximate Posterior Distributions in Python - Published in JOSS (2018)

https://github.com/dflemin3/approxposterior

Keywords

approximate-inference bayesian-inference bayesian-optimization gaussian-processes inference python

Scientific Fields

Engineering Computer Science - 60% confidence

Computer Science Computer Science - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

A Python package for approximate Bayesian inference and optimization using Gaussian processes

Basic Info

Host: GitHub
Owner: dflemin3
License: mit
Language: Python
Default Branch: master
Homepage: https://dflemin3.github.io/approxposterior/
Size: 50.1 MB

Statistics

Stars: 42
Watchers: 7
Forks: 10
Open Issues: 5
Releases: 0

Topics

approximate-inference bayesian-inference bayesian-optimization gaussian-processes inference python

Created over 8 years ago · Last pushed over 2 years ago

Metadata Files

Readme Changelog License Codemeta

approxposterior

A Python package for approximate Bayesian inference with computationally-expensive models

Overview

approxposterior is a Python package for efficient approximate Bayesian inference and Bayesian optimization of computationally-expensive models. approxposterior trains a Gaussian process (GP) surrogate for the computationally-expensive model and employs an active learning approach to iteratively improve the GPs predictive performance while minimizing the number of calls to the expensive model required to generate the GP's training set.

approxposterior implements both the Bayesian Active Learning for Posterior Estimation (BAPE, Kandasamy et al. (2017)) and Adaptive Gaussian process approximation for Bayesian inference with expensive likelihood functions (AGP, Wang & Li (2018)) algorithms for estimating posterior probability distributions for use with inference problems with computationally-expensive models. In such situations, the goal is to infer posterior probability distributions for model parameters, given some data, with the additional constraint of minimizing the number of forward model evaluations given the model's assumed large computational cost. approxposterior trains a Gaussian Process (GP) surrogate model for the likelihood evaluation by modeling the covariances in logprobability (logprior + loglikelihood) space. approxposterior then uses this GP within an MCMC sampler for each likelihood evaluation to perform the inference. approxposterior iteratively improves the GP's predictive performance by leveraging the inherent uncertainty in the GP's predictions to identify high-likelihood regions in parameter space where the GP is uncertain. approxposterior then evaluates the forward model at these points to expand the training set in relevant regions of parameter space, re-training the GP to maximize its predictive ability while minimizing the size of the training set. Check out the BAPE paper by Kandasamy et al. (2017) and the AGP paper by Wang & Li (2018) for in-depth descriptions of the respective algorithms.

Documentation

Check out the documentation at https://dflemin3.github.io/approxposterior/ for a more in-depth explanation about the code, detailed API notes, numerous examples with figures.

Installation

Using conda:

bash conda install -c conda-forge approxposterior

Using pip:

bash pip install approxposterior

This step can fail if george (the Python Gaussian Process package) is not properly installed and compiled. To install george, run

bash conda install -c conda-forge george

From source:

bash git clone https://github.com/dflemin3/approxposterior.git cd approxposterior python setup.py install

A simple example

Below is a simple application of approxposterior based on the Wang & Li (2017) example.

```python from approxposterior import approx, gpUtils, likelihood as lh, utility as ut import numpy as np

Define algorithm parameters

m0 = 50 # Initial size of training set m = 20 # Number of new points to find each iteration nmax = 2 # Maximum number of iterations bounds = [(-5,5), (-5,5)] # Prior bounds algorithm = "bape" # Use the Kandasamy et al. (2017) formalism seed = 57 # RNG seed np.random.seed(seed)

emcee MCMC parameters

samplerKwargs = {"nwalkers" : 20} # emcee.EnsembleSampler parameters mcmcKwargs = {"iterations" : int(2.0e4)} # emcee.EnsembleSampler.run_mcmc parameters

Sample design points from prior

theta = lh.rosenbrockSample(m0)

Evaluate forward model log likelihood + lnprior for each theta

y = np.zeros(len(theta)) for ii in range(len(theta)): y[ii] = lh.rosenbrockLnlike(theta[ii]) + lh.rosenbrockLnprior(theta[ii])

Default GP with an ExpSquaredKernel

gp = gpUtils.defaultGP(theta, y, white_noise=-12)

Initialize object using the Wang & Li (2018) Rosenbrock function example

ap = approx.ApproxPosterior(theta=theta, y=y, gp=gp, lnprior=lh.rosenbrockLnprior, lnlike=lh.rosenbrockLnlike, priorSample=lh.rosenbrockSample, bounds=bounds, algorithm=algorithm)

Run!

ap.run(m=m, nmax=nmax, estBurnin=True, nGPRestarts=3, mcmcKwargs=mcmcKwargs, cache=False, samplerKwargs=samplerKwargs, verbose=True, thinChains=False, onlyLastMCMC=True)

Check out the final posterior distribution!

import corner

Load in chain from last iteration

samples = ap.sampler.get_chain(discard=ap.iburns[-1], flat=True, thin=ap.ithins[-1])

Corner plot!

fig = corner.corner(samples, quantiles=[0.16, 0.5, 0.84], showtitles=True, scalehist=True, plot_contours=True)

Plot where forward model was evaluated - uncomment to plot!

fig.axes[2].scatter(ap.theta[m0:,0], ap.theta[m0:,1], s=10, color="red", zorder=20)

Save figure

fig.savefig("finalPosterior.png", bbox_inches="tight") ```

The final distribution will look something like this:

finalPosterior

The red points were selected by approxposterior by maximizing the BAPE utility function. At each red point, approxposterior ran the forward model to evaluate the true likelihood and added this input-likelihood pair to the GP's training set. We retrain the GP each time to improve its predictive ability. Note how the points are selected in regions of high posterior density, precisely where we would want to maximize the GP's predictive ability! By using the BAPE point selection scheme, approxposterior does not waste computational resources by evaluating the forward model in low likelihood regions.

Check out the examples directory for Jupyter Notebook examples and explanations. Check out the full documentation for a more in-depth explanation of classes, methods, variables, and how to use the code.

Contribution

If you would like to contribute to this code, please feel free to fork the repository, make some edits, and open a pull request. If you find a bug, have a suggestion, etc, please open up an issue!

Citation

If you use this code, please cite the following:

Fleming and VanderPlas (2018):

bash @ARTICLE{Fleming2018, author = {{Fleming}, D.~P. and {VanderPlas}, J.}, title = "{approxposterior: Approximate Posterior Distributions in Python}", journal = {The Journal of Open Source Software}, year = 2018, month = sep, volume = 3, pages = {781}, doi = {10.21105/joss.00781}, adsurl = {http://adsabs.harvard.edu/abs/2018JOSS....3..781P}, adsnote = {Provided by the SAO/NASA Astrophysics Data System} }

Kandasamy et al. (2017):

bash @article{Kandasamy2017, title = "Query efficient posterior estimation in scientific experiments via Bayesian active learning", journal = "Artificial Intelligence", volume = "243", pages = "45 - 56", year = "2017", issn = "0004-3702", doi = "https://doi.org/10.1016/j.artint.2016.11.002", url = "http://www.sciencedirect.com/science/article/pii/S0004370216301394", author = "Kirthevasan Kandasamy and Jeff Schneider and Barnabás Póczos", keywords = "Posterior estimation, Active learning, Gaussian processes"}

Wang & Li (2018):

bash @article{Wang2018, author = {Wang, Hongqiao and Li, Jinglai}, title = {Adaptive Gaussian Process Approximation for Bayesian Inference with Expensive Likelihood Functions}, journal = {Neural Computation}, volume = {30}, number = {11}, pages = {3072-3094}, year = {2018}, doi = {10.1162/neco\_a\_01127}, URL = { https://doi.org/10.1162/neco_a_01127}, eprint = {https://doi.org/10.1162/neco_a_01127}}

Owner

Name: David Fleming
Login: dflemin3
Kind: user
Location: St. Louis, MO
Company: Bayer Crop Science

Repositories: 31
Profile: https://github.com/dflemin3

Data Science Technical Lead with a PhD in Astronomy from the University of Washington. I predict things slightly better than chance

JOSS Publication

approxposterior: Approximate Posterior Distributions in Python

Published

September 04, 2018

DOI

10.21105/joss.00781

Volume 3, Issue 29, Page 781

Authors

David P. Fleming

University of Washington

Jake VanderPlas

University of Washington

Editor

Jed Brown

View PDF Review Thread Software Archive

CodeMeta (codemeta.json)

{
  "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
  "@type": "Code",
  "author": [
    {
      "@id": "0000-0001-9293-4043",
      "@type": "Person",
      "email": "dflemin3@uw.edu",
      "name": "David Fleming",
      "affiliation": "University of Washington"
    }
  ],
  "identifier": "",
  "codeRepository": "github.com/dflemin3/approxposterior",
  "datePublished": "2018-04-24",
  "dateModified": "2019-11-27",
  "dateCreated": "2018-04-24",
  "description": "A python implementation of Kandasamy et al. (2015) for deriving approximate posterior distributions.",
  "keywords": "python, Bayes, posterior, distribution, gaussian process, mcmc",
  "license": "MIT",
  "title": "approxposterior",
  "version": "v0.3"
}

GitHub Events

Total

Watch event: 1
Fork event: 1

Last Year

Watch event: 1
Fork event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 409
Total Committers: 5
Avg Commits per committer: 81.8
Development Distribution Score (DDS): 0.017

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
David Fleming	d**3@u**u	402
Jessica Birky	j**y@u**u	3
Jacob Lustig-Yaeger	j**y@u**u	2
Syrtis Major	s**y@g**m	1
Jed Brown	j**d@j**g	1

Committer Domains (Top 20 + Academic)

uw.edu: 3 jedbrown.org: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 37
Total pull requests: 36
Average time to close issues: 4 months
Average time to close pull requests: 26 days
Total issue authors: 8
Total pull request authors: 5
Average comments per issue: 1.43
Average comments per pull request: 0.03
Merged pull requests: 36
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

dflemin3 (28)
jlustigy (3)
RoryBarnes (1)
mgeier (1)
syrte (1)
mrhheffernan (1)
dmdu (1)
tomr-stargazer (1)

Pull Request Authors

dflemin3 (32)
jedbrown (1)
syrte (1)
jbirky (1)
jlustigy (1)

Top Labels

Issue Labels

enhancement (21) bug (4) question (2)

Pull Request Labels

enhancement (9) bugfix (2)

Packages

Total packages: 2
Total downloads:
- pypi 77 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 1
(may contain duplicates)
Total versions: 10
Total maintainers: 1

pypi.org: approxposterior

Gaussian Process Approximation to Posterior Distributions

Homepage: https://github.com/dflemin3/approxposterior
Documentation: https://approxposterior.readthedocs.io/
License: MIT
Latest release: 0.4
published about 6 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 77 Last month

Rankings

Dependent packages count: 10.1%

Stargazers count: 10.5%

Forks count: 11.4%

Average: 16.0%

Dependent repos count: 21.6%

Downloads: 26.6%

Maintainers (1)

dflemin3

Last synced: 6 months ago

conda-forge.org: approxposterior

This package is a Python implementation of Bayesian Active Learning for Posterior Estimation by Kandasamy et al. (2015) and Adaptive Gaussian process approximation for Bayesian inference with expensive likelihood functions by Wang & Li (2017). These algorithms allows the user to compute approximate posterior probability distributions using computationally expensive forward models by training a Gaussian Process (GP) surrogate for the likelihood evaluation. The algorithms leverage the inherent uncertainty in the GP's predictions to identify high-likelihood regions in parameter space where the GP is uncertain. The algorithms then run the forward model at these points to compute their likelihood and re-trains the GP to maximize the GP's predictive ability while minimizing the number of forward model evaluations. Check out Bayesian Active Learning for Posterior Estimation by Kandasamy et al. (2015) and Adaptive Gaussian process approximation for Bayesian inference with expensive likelihood functions by Wang & Li (2017) for in-depth descriptions of the respective algorithms.

Homepage: http://github.com/dflemin3/approxposterior
License: MIT
Latest release: 0.4
published about 6 years ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 34.0%

Stargazers count: 40.1%

Average: 41.9%

Forks count: 42.2%

Dependent packages count: 51.2%

Last synced: 6 months ago

approxposterior

Science Score: 95.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

approxposterior

Overview

Documentation

Installation

A simple example

Define algorithm parameters

emcee MCMC parameters

Sample design points from prior

Evaluate forward model log likelihood + lnprior for each theta

Default GP with an ExpSquaredKernel

Initialize object using the Wang & Li (2018) Rosenbrock function example

Run!

Check out the final posterior distribution!

Load in chain from last iteration

Corner plot!

Plot where forward model was evaluated - uncomment to plot!

Save figure

Contribution

Citation

Owner

JOSS Publication

approxposterior: Approximate Posterior Distributions in Python

Authors

Editor

CodeMeta (codemeta.json)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: approxposterior

Rankings

Maintainers (1)

conda-forge.org: approxposterior

Rankings

Dependencies