Soss.jl-8ce77f84-9b61-11e8-39ff-d17a774bf41c

Last snapshots taken from https://gitlab.com/UnofficialJuliaMirror/Soss.jl-8ce77f84-9b61-11e8-39ff-d17a774bf41c on 2019-11-20T20:07:23.741-05:00 by @UnofficialJuliaMirrorBot via Travis job 153.69 , triggered by Travis cron job on branch "master"

https://gitlab.com/UnofficialJuliaMirrorSnapshots/Soss.jl-8ce77f84-9b61-11e8-39ff-d17a774bf41c

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Last snapshots taken from https://gitlab.com/UnofficialJuliaMirror/Soss.jl-8ce77f84-9b61-11e8-39ff-d17a774bf41c on 2019-11-20T20:07:23.741-05:00 by @UnofficialJuliaMirrorBot via Travis job 153.69 , triggered by Travis cron job on branch "master"

Basic Info
  • Host: gitlab.com
  • Owner: UnofficialJuliaMirrorSnapshots
  • Default Branch: master
Statistics
  • Stars: 0
  • Forks: 0
  • Open Issues:
  • Releases: 0
Created almost 7 years ago
Metadata Files
Readme License Citation

README.jmd

# Soss

[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://cscherrer.github.io/Soss.jl/stable)
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://cscherrer.github.io/Soss.jl/dev)
[![Build Status](https://travis-ci.com/cscherrer/Soss.jl.svg?branch=master)](https://travis-ci.com/cscherrer/Soss.jl)
[![Build Status](https://ci.appveyor.com/api/projects/status/github/cscherrer/Soss.jl?svg=true)](https://ci.appveyor.com/project/cscherrer/Soss-jl)
[![Codecov](https://codecov.io/gh/cscherrer/Soss.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/cscherrer/Soss.jl)
[![Coveralls](https://coveralls.io/repos/github/cscherrer/Soss.jl/badge.svg?branch=master)](https://coveralls.io/github/cscherrer/Soss.jl?branch=master)

Soss is a library for _probabilistic programming_

Let's jump right in with a simple linear model:

```julia
using Soss 

m = @model X begin
    β ~ Normal() |> iid(size(X,2))
    y ~ For(eachrow(X)) do x
        Normal(x' * β, 1)
    end
end;
```

In Soss, models are _first-class_ and _function-like_, and "applying" a model to its arguments gives a _joint distribution_.

Just a few of the things we can do in Soss:

- Sample from the (forward) model
- Condition a joint distribution on a subset of parameters
- Have arbitrary Julia values (yes, even other models) as inputs or outputs of a model
- Build a new model for the _predictive_ distribution, for assigning parameters to particular values

Let's use our model to build some fake data:
```julia; term=true
import Random; Random.seed!(3)
X = randn(6,2)
```

```julia; term=true
truth = rand(m(X=X));
pairs(truth)
```

```julia; term=true
truth.β
```

```julia; term=true
truth.y 
```

And now pretend we don't know `β`, and have the model figure it out. 
Often these are easier to work with in terms of `particles` (built using [MonteCarloMeasurements.jl](https://github.com/baggepinnen/MonteCarloMeasurements.jl)):

```julia; term=true
post = dynamicHMC(m(X=truth.X), (y=truth.y,));
particles(post)
```

For model diagnostics and prediction, we need the _predictive distribution_:
```julia; term=true
pred = predictive(m,:β)
```

This requires `X` and `β` as inputs, so we can do something like this to do a _posterior predictive check_

```julia
ppc = [rand(pred(;X=truth.X, p...)).y for p in post];

truth.y - particles(ppc)
```

These play a role similar to that of residuals in a non-Bayesian approach (there's plenty more detail to go into, but that's for another time).

With some minor modifications, we can put this into a form that allows symbolic simplification:
```julia; term=true; line_width=500
m2 = @model X begin
    N = size(X,1)
    k = size(X,2)
    β ~ Normal() |> iid(k)
    yhat = X * β
    y ~ For(N) do j
            Normal(yhat[j], 1)
        end
end;

symlogpdf(m2) 
```

There's clearly some redundant computation within the sums, so it helps to expand:

```julia; term=true; line_width=500
symlogpdf(m2) |> expandSums |> foldConstants
```

We can use the symbolic simplification to speed up computations:

```julia; term=true
using BenchmarkTools

@btime logpdf($m2(X=X), $truth)
@btime logpdf($m2(X=X), $truth, $codegen)
```

## What's Really Happening Here?

Under the hood, `rand` and `logpdf` specify different ways of "running" the model.

 `rand`  turns each `v ~ dist` into `v = rand(dist)`, finally outputting the `NamedTuple` of all values it has seen.

`logpdf` steps through the same program, but instead accumulates a log-density. It begins by initializing `_ℓ = 0.0`. Then at each step, it turns `v ~ dist` into `_ℓ += logpdf(dist, v)`, before finally returning `_ℓ`.

Note that I said "turns into" instead of "interprets". Soss uses [`GG.jl`](https://github.com/thautwarm/GG.jl) to generate specialized code for a given model, inference primitive (like `rand` and `logpdf`), and type of data. 

This idea can be used in much more complex ways. `weightedSample` is a sort of hybrid between `rand` and `logpdf`. For data that are provided, it increments a `_ℓ` using `logpdf`. Unknown values are sampled using `rand`.

```julia; term=true
ℓ, proposal = weightedSample(m(X=X), (y=truth.y,));
ℓ
proposal.β
```

Again, there's no runtime check needed for this. Each of these is compiled the first time it is called, so future calls are very fast. Functions like this are great to use in tight loops.



## To Do

We need a way to "lift" a "`Distribution`" (without parameters, so really a family) to a `Model`, or one with parameters to a `JointDistribution`

Models are "function-like", so a `JointDistribution` should be sometimes usable as a value. `m1(m2(args))` should work.

This also means `m1 ∘ m2` should be fine

Since inference primitives are specialized for the type of data, we can include methods for `Union{Missing, T}` data. PyMC3 has something like this, but for us it will be better since we know at compile time whether any data are missing.

There's a `return` available in case you want a result other than a `NamedTuple`, but it's a little fiddly still. I think whether the `return` is respected or ignored should depend on the inference primitive. And some will also modify it, similar to how a state monad works. Likelihood weighting is an example of this.

Rather than having lots of functions for inference, anything that's not a primitive should (I think for now at least) be a method of... let's call it `sample`. This should always return an iterator, so we can combine results after the fact using tools like `IterTools`, `ResumableFunctions`, and `Transducers`.

This situation just described is for generating a sequence of samples from a single distribution. But we may also have models with a sequence of distributions, either observed or sampled, or a mix. This can be something like Haskell's `iterateM`, though we need to think carefully about the specifics.

We already have a way to `merge` models, we should look into intersection as well.

We need ways to interact with Turing and Gen. Some ideas:

- Turn a Soss model into an "outside" (Turing or Gen) model
- Embed outside models as a black box in a Soss model, using their methods for inference

## Stargazers over time

[![Stargazers over time](https://starchart.cc/cscherrer/Soss.jl.svg)](https://starchart.cc/cscherrer/Soss.jl)


Citation (CITATION.bib)

@misc{Soss.jl,
	author  = {Chad Scherrer},
	title   = {{Soss.jl}},
	url     = {https://github.com/cscherrer/Soss.jl},
	version = {v0.5.0},
	year    = {2019},
	month   = {9}
}