posteriordb

Database with posteriors of interest for Bayesian inference

https://github.com/stan-dev/posteriordb

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    3 of 21 committers (14.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary

Keywords

bayes bayesian posterior

Keywords from Contributors

stan closember transformation
Last synced: 6 months ago · JSON representation

Repository

Database with posteriors of interest for Bayesian inference

Basic Info
  • Host: GitHub
  • Owner: stan-dev
  • License: bsd-3-clause
  • Language: Stan
  • Default Branch: master
  • Homepage:
  • Size: 72.4 MB
Statistics
  • Stars: 205
  • Watchers: 9
  • Forks: 42
  • Open Issues: 66
  • Releases: 4
Topics
bayes bayesian posterior
Created almost 7 years ago · Last pushed 6 months ago
Metadata Files
Readme Citation

README.Rmd

---
output:
  md_document:
    variant: markdown_github
---



[![posteriordb Content](https://github.com/stan-dev/posteriordb/actions/workflows/posteriordb_content.yml/badge.svg)](https://github.com/stan-dev/posteriordb/actions/workflows/posteriordb_content.yml) [![R-CMD-check](https://github.com/stan-dev/posteriordb-r/actions/workflows/check-release.yaml/badge.svg)](https://github.com/stan-dev/posteriordb-r/actions/workflows/check-release.yaml) [![Codecov test coverage](https://codecov.io/gh/stan-dev/posteriordb-r/branch/main/graph/badge.svg)](https://codecov.io/gh/stan-dev/posteriordb-r?branch=main) [![Python](https://github.com/stan-dev/posteriordb-python/actions/workflows/push.yml/badge.svg)](https://github.com/stan-dev/posteriordb-python/actions/workflows/push.yml)


# `posteriordb`: a database of Bayesian posterior inference

## What is `posteriordb`?

`posteriordb` is a set of posteriors, i.e. Bayesian statistical models and data sets, reference implementations in probabilistic programming languages, and reference posterior inferences in the form of posterior samples.

## Why use `posteriordb`?

`posteriordb` is designed to test inference algorithms across a wide range of models and data sets.  Applications include testing for accuracy, speed, and scalability. `posteriordb` can be used to test new algorithms being developed or deployed as part of continuous integration for ongoing regression testing algorithms in probabilistic programming frameworks.

`posteriordb` also makes it easy for students and instructors to access various pedagogical and real-world examples with precise model definitions, well-curated data sets, and reference posteriors.

`posteriordb` is framework agnostic and easily accessible from R and Python.

For more details regarding the use cases of  `posteriordb`, see [doc/use_cases.md](https://github.com/stan-dev/posteriordb/blob/master/doc/use_cases.md).


## Content

See [DATABASE_CONTENT.md](https://github.com/stan-dev/posteriordb/blob/master/doc/DATABASE_CONTENT.md) for the details content of the posterior database.

## Contributing

We are happy with any help in adding posteriors, data, and models to the database! See [CONTRIBUTING.md](https://github.com/stan-dev/posteriordb/blob/master/doc/CONTRIBUTING.md) for the details on how to contribute.

## Licensing
The posteriordb is licensed under the [new BSD
license](https://github.com/stan-dev/posteriordb/LICENCE.md).

Most model code are using the the same BSD-3 licence. Although, some models and some data might have other open licences such as MIT. Each model has a ```licence``` element in the info JSON files that specifies the actual licence of the model. Some data might also have licences, then that is specified in a similar way.

## Using `posteriordb`

To simplify the use of `posteriordb`, there are convenience functions both in Python and in R. To use R, see the [posteriordb-r](https://github.com/stan-dev/posteriordb-r) repository, and to use Python, see the [posteriordb-python](https://github.com/stan-dev/posteriordb-python) repository.

## Citing `posteriordb`
Developing and maintaining open-source software is an important yet often underappreciated contribution to scientific progress. Thus, please make sure to cite it appropriately so that developers get credit for their work. Information on how to cite `posteriordb` can be found in the [CITATION.cff](https://github.com/stan-dev/posteriordb/blob/master/CITATION.cff) file. Use the "cite this repository" button under "About" to get a simple BibTeX or APA snippet.

As `posteriordb` rely heavily on Stan, so please consider also to cite Stan:

Carpenter B., Gelman A., Hoffman M. D., Lee D., Goodrich B., Betancourt M., Brubaker M., Guo J., Li P., and Riddell A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software. 76(1). 10.18637/jss.v076.i01


## Design choices (so far)
The main focus of the database is simplicity, both in understanding and in use.

The following are the current design choices in designing the posterior database.

1. Priors are hardcoded in model files as changing the prior changes the posterior.
   Create a new model to test different priors.
1. Data transformations are stored as different datasets.
   Create new data to test different data transformations, subsets, and variable settings. This design choice makes the database larger/less memory efficient but simplifies the analysis of individual posteriors.
1. Models and data has (model/data).info.json files with model and data specific information.
1. Templates for different JSONs can be found in content/templates and schemas in schemas (Note: these don't exist right now and will be added later)
1. Prefix 'syn_' stands for synthetic data where the generative process is known and found in content/data-raw.
1. All data preprocessing is included in content/data-raw.
1. Specific information for different PPL representations of models is included in the PPL syntax files as comments, not in the model.info.json files.

## Versioning of models
We might update models included in posteriordb over time. However, the models will only have the same name in posteriordb if the log density is the same (up to a normalizing constant). Otherwise, we will include a new model in the database.

Owner

  • Name: Stan
  • Login: stan-dev
  • Kind: organization
  • Email: mc.stanislaw@gmail.com

GitHub Events

Total
  • Issues event: 20
  • Watch event: 20
  • Member event: 1
  • Issue comment event: 40
  • Push event: 15
  • Pull request review comment event: 3
  • Pull request review event: 4
  • Pull request event: 24
  • Fork event: 6
  • Create event: 4
Last Year
  • Issues event: 20
  • Watch event: 20
  • Member event: 1
  • Issue comment event: 40
  • Push event: 15
  • Pull request review comment event: 3
  • Pull request review event: 4
  • Pull request event: 24
  • Fork event: 6
  • Create event: 4

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 1,030
  • Total Committers: 21
  • Avg Commits per committer: 49.048
  • Development Distribution Score (DDS): 0.328
Past Year
  • Commits: 23
  • Committers: 2
  • Avg Commits per committer: 11.5
  • Development Distribution Score (DDS): 0.174
Top Committers
Name Email Commits
Måns Magnusson m****n@g****m 692
jarnefeltoliver o****t@a****i 121
Eero Linna e****a@g****m 46
Ari Hartikainen a****n 41
Eero Linna e****a@a****i 25
JTorgander 5****r 19
Ari Hartikainen 15
Aki Vehtari a****i@a****i 10
Paul Buerkner p****r@g****m 10
Keane Nguyen k****4@g****m 10
ahartikainen a****n@g****m 9
Kane Lindsay k****0@g****m 8
Guillaume Baudart g****t@i****m 7
Ari Hartikainen a****n@r****i 6
Eero Linna e****a@h****m 3
Järnefelt Oliver j****1@l****i 3
Ben b****2@g****m 1
Bob Carpenter b****r@f****g 1
Kane Lindsay s****s@l****k 1
PhilClemson p****n@l****k 1
Seth Axen s****h@s****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 88
  • Total pull requests: 59
  • Average time to close issues: 8 months
  • Average time to close pull requests: 23 days
  • Total issue authors: 25
  • Total pull request authors: 21
  • Average comments per issue: 3.08
  • Average comments per pull request: 1.9
  • Merged pull requests: 46
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 18
  • Pull requests: 16
  • Average time to close issues: 24 days
  • Average time to close pull requests: 8 days
  • Issue authors: 7
  • Pull request authors: 9
  • Average comments per issue: 1.0
  • Average comments per pull request: 1.88
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • MansMeg (45)
  • eerolinna (6)
  • bbbales2 (5)
  • JasonPekos (3)
  • avehtari (3)
  • bob-carpenter (2)
  • ahartikainen (2)
  • JohannesBuchner (2)
  • pipme (2)
  • rok-cesnovar (2)
  • JTorgander (2)
  • LuZhangstat (1)
  • mitzimorris (1)
  • trappmartin (1)
  • fonnesbeck (1)
Pull Request Authors
  • MansMeg (22)
  • JTorgander (20)
  • ahartikainen (7)
  • fonnesbeck (4)
  • PhilClemson (3)
  • gbdrt (2)
  • jessegrabowski (2)
  • aseyboldt (2)
  • bob-carpenter (2)
  • avehtari (2)
  • JasonPekos (1)
  • bbbales2 (1)
  • Ch0ronomato (1)
  • KaneLindsay (1)
  • sethaxen (1)
Top Labels
Issue Labels
bug (1) good first issue (1)
Pull Request Labels

Dependencies

.github/workflows/posteriordb_content.yml actions
  • actions/cache v1 composite
  • actions/checkout v3 composite
  • jitterbit/get-changed-files v1 composite
  • r-lib/actions/setup-r v2 composite