https://github.com/congyewang/stein_thinning

Python package for Stein Thinning

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Python package for Stein Thinning

Basic Info

Host: GitHub
Owner: congyewang
Language: Python
Default Branch: master
Homepage:
Size: 43 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Fork of wilson-ye-chen/stein_thinning

Created over 3 years ago · Last pushed about 3 years ago

https://github.com/congyewang/stein_thinning/blob/master/

# Stein Thinning
This Python package implements an algorithm for optimally compressing
sampling algorithm outputs by minimising a kernel Stein discrepancy.
Please see the accompanying paper "Optimal Thinning of MCMC Output"
([arXiv](https://arxiv.org/pdf/2005.03952.pdf)) for details of the
algorithm.

# Installing via Git
One can pip install the package directly from this repository:
```
pip install git+https://github.com/wilson-ye-chen/stein_thinning
```

# Getting Started
For example, correlated samples from a posterior distribution are
obtained using a MCMC algorithm and stored in the NumPy array `smpl`,
and the corresponding gradients of the log-posterior are stored in
another NumPy array `grad`. One can then perform Stein Thinning to
obtain a subset of 40 sample points by running the following code:
```python
from stein_thinning.thinning import thin
idx = thin(smpl, grad, 40)
```
The `thin` function returns a NumPy array containing the row indices
in `smpl` (and `grad`) of the selected points. Please refer to `demo.py`
as a starting example.

The default usage requires no additional user input and is based on
the identity (`id`) preconditioning matrix and standardised sample.
Alternatively, the user can choose to specify which heuristic to use
for computing the preconditioning matrix by setting the option string
to either `id`, `med`,  `sclmed`, or `smpcov`. Standardisation can be
disabled by setting `stnd=False`. For example, the default setting
corresponds to:
```python
idx = thin(smpl, grad, 40, stnd=True, pre='id')
```
The details for each of the heuristics are documented in Section 2.3 of
the accompanying paper.

# PyStan Example
As an illustration of how Stein Thinning can be used to post-process
output from [Stan](https://mc-stan.org/users/interfaces/pystan), consider
the following simple Stan script that produces correlated samples from a
bivariate Gaussian model:
```python
from pystan import StanModel
mc = """
parameters {vector[2] x;}
model {x ~ multi_normal([0, 0], [[1, 0.8], [0.8, 1]]);}
"""
sm = StanModel(model_code=mc)
fit = sm.sampling(iter=1000)
```
The bivariate Gaussian model is used for illustration, but regardless of
the complexity of the model being sampled the output of Stan will always
be a `fit` object (StanFit instance). The sampled points and the
log-posterior gradients can be extracted from the returned `fit` object:
```python
import numpy as np
smpl = fit['x']
grad = np.apply_along_axis(fit.grad_log_prob, 1, smpl)
idx = thin(smpl, grad, 40)
```
The above example can be found in `pystan/demo.py`.

Owner

Name: Congye
Login: congyewang
Kind: user
Location: Newcastle upon Tyne
Company: Newcastle University

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/congyewang/stein_thinning

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/congyewang/stein_thinning/blob/master/

Owner

GitHub Events

Total

Last Year