Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Keywords from Contributors
Repository
A backend for storing MCMC draws.
Basic Info
Statistics
- Stars: 20
- Watchers: 5
- Forks: 6
- Open Issues: 6
- Releases: 18
Metadata Files
README.md
Where do you want to store your MCMC draws? In memory? On disk? Or in a database running in a datacenter?
No matter where you want to put them, or which PPL generates them: McBackend takes care of your MCMC samples.
Quickstart
The mcbackend package consists of three parts:
Part 1: A schema for MCMC run & chain metadata
No matter which programming language your favorite PPL is written in, the ProtocolBuffers from McBackend can be used to generate code in languages like C++, C#, Python and many more to represent commonly used metadata about MCMC runs, chains and model variables.
The definitions in protobufs/meta.proto are designed to maximize compatibility with ArviZ objects, making it easy to transform MCMC draws stored according to the McBackend schema to InferenceData objects for plotting & analysis.
Part 2: A storage backend interface
The draws and stats created by MCMC sampling algorithms at runtime need to be stored somewhere.
This "somewhere" is called the storage backend in PPLs/MCMC frameworks like PyMC or emcee.
Most storage backends must be initialized with metadata about the model variables so they can, for example, pre-allocated memory for the draws and stats they're about to receive.
After then receiving thousands of draws and stats they must then provide methods by which the draws/stats can be retrieved.
The mcbackend.core module has classes such as Backend, Run, and Chain to define these interfaces for any storage backend, no matter if it's an in-memory, filesystem or database storage.
Albeit this implementation is currently Python-only, the interface signature should be portable to e.g. C++.
Via mcbackend.backends the McBackend package then provides backend implementations.
Currently you may choose from:
```python backend = mcbackend.NumPyBackend() backend = mcbackend.ClickHouseBackend( client=clickhouse_driver.Client("localhost") )
All that matters:
isinstance(backend, mcbackend.Backend)
>>> True
```
Part 3: PPL adapters
Anything that is a Backend can be wrapped by an adapter that makes it compatible with your favorite PPL.
In the example below, a ClickHouseBackend is initialized to store MCMC draws from a PyMC model in a ClickHouse database.
See below for how to run it in Docker.
```python import clickhouse_driver import mcbackend import pymc as pm
1. Create any kind of backend
chclient = clickhousedriver.Client("localhost") backend = mcbackend.ClickHouseBackend(ch_client)
with pm.Model(): # 2. Create your model ... # 3. Hit the inference button ™ while passing the backend! pm.sample(trace=backend) ```
In case of PyMC the adapter lives in the PyMC codebase since version 5.1.1,
so all you need to do is pass any mcbackend.Backend via the pm.sample(trace=...) parameter!
Instead of using PyMC's built-in NumPy backend, the MCMC draws now end up in ClickHouse.
Retrieving the draws & stats
Continuing the example from above we can now retrieve draws from the backend.
Note that since this example wrote the draws to ClickHouse, we could run the code below on another machine, and even while the above model is still sampling!
```python backend = mcbackend.ClickHouseBackend(ch_client)
Fetch the run from the database (downloads just metadata)
run = backend.getrun(trace.runid)
Get all draws from a chain
chain = run.getchains()[0] chain.getdraws("my favorite variable")
>>> array([ ... ])
Convert everything to InferenceData
idata = run.to_inferencedata() print(idata)
>>> Inference data with groups:
>>> > posterior
>>> > sample_stats
>>> > observed_data
>>> > constant_data
>>>
>>> Warmup iterations saved (warmup_*).
```
Contributing what's next
McBackend just started and is looking for contributions.
For example:
* Schema discussion: Which metadata is needed? (related: PyMC #5160)
* Interface discussion: How should Backend/Run/Chain evolve?
* Python Backends for disk storage (HDF5, *.proto, ...)
* C++ Backend/Run/Chain interfaces
* C++ ClickHouse backend (via clickhouse-cpp)
As the schema and API stabilizes a mid-term goal might be to replace PyMC BaseTrace/MultiTrace entirely to rely on mcbackend.
Getting rid of MultiTrace was a long-term goal behind making pm.sample(return_inferencedata=True) the default.
Development
First clone the repository and set up a development environment containing the protobuf compiler.
bash
mamba create -n mcb python=3.11 grpcio-tools protobuf -y
activate mcb
pip install -r requirements-dev.txt
pip install --pre "betterproto[compiler]"
pip install -e .
To compile the *.proto files for languages other than Python, check the ProtocolBuffers documentation.
The following script compiles them for Python using the betterproto compiler plugin to get nice-looking dataclasses.
It also copies the generated files to the right place in mcbackend.
bash
python protobufs/generate.py
pre-commit run --all
To run the tests:
bash
pytest -v
Some tests need a ClickHouse database server running locally. To start one in Docker:
bash
docker run --detach --rm --name mcbackend-db -p 9000:9000 --ulimit nofile=262144:262144 clickhouse/clickhouse-server
Owner
- Name: PyMC
- Login: pymc-devs
- Kind: organization
- Website: https://www.pymc.io
- Twitter: pymc_devs
- Repositories: 34
- Profile: https://github.com/pymc-devs
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: McBackend
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Michael
family-names: Osthege
email: michael.osthege@outlook.com
orcid: 'https://orcid.org/0000-0002-2734-7624'
affiliation: Forschungszentrum Jülich GmbH
- name: PyMC Developers
website: 'https://www.pymc.io/'
repository-code: 'https://github.com/pymc-devs/mcbackend'
abstract: >-
McBackend is an abstraction of backends for storing MCMC
draws. It encodes MCMC run metadata using protocol
bufferes, and comes with various backend implementation
such as an in-memory backend, or one for streaming MCMC
draws to a ClickHouse database. The backends store not
only MCMC draws but also sampler statistics, and are
compatible with sparse data, or varying dimensionality.
MCMC chains stored with McBackend can be queried directly,
or convert to the popular ArviZ InferenceData objects.
keywords:
- mcmc
- arviz
- pymc
license: AGPL-3.0
commit: 5dca137855e650848920cbfc8cd095d15a5378a9
version: 0.5.0
date-released: '2023-03-30'
GitHub Events
Total
- Release event: 1
- Watch event: 5
- Delete event: 6
- Push event: 10
- Pull request review event: 4
- Pull request event: 13
- Fork event: 1
- Create event: 6
Last Year
- Release event: 1
- Watch event: 5
- Delete event: 6
- Push event: 10
- Pull request review event: 4
- Pull request event: 13
- Fork event: 1
- Create event: 6
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Michael Osthege | m****e@o****m | 124 |
| dependabot[bot] | 4****] | 37 |
| pre-commit-ci[bot] | 6****] | 2 |
| Alexandre René | a****e@g****m | 2 |
| Thomas Aarholt | t****t@g****m | 1 |
| Ben Mares | s****1@t****m | 1 |
| Alexis Shakas | a****s@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 2
- Total pull requests: 30
- Average time to close issues: about 1 hour
- Average time to close pull requests: 16 days
- Total issue authors: 2
- Total pull request authors: 4
- Average comments per issue: 0.5
- Average comments per pull request: 0.2
- Merged pull requests: 29
- Bot issues: 0
- Bot pull requests: 24
Past Year
- Issues: 0
- Pull requests: 10
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 10
Top Authors
Issue Authors
- michaelosthege (1)
Pull Request Authors
- dependabot[bot] (28)
- michaelosthege (6)
- pre-commit-ci[bot] (4)
- alcrene (4)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4.5.0 composite
- codecov/codecov-action v3 composite
- yandex/clickhouse-server * docker
- actions/checkout v3 composite
- actions/setup-python v4.5.0 composite
- pre-commit/action v3.0.0 composite
- actions/checkout v3 composite
- actions/setup-python v4.5.0 composite
- yandex/clickhouse-server * docker
- arviz * development
- clickhouse-driver * development
- flake8 * development
- pymc ==5.0.2 development
- pytest * development
- pytest-cov * development
- twine * development
- wheel * development
- betterproto ==2.0.0b5
- hagelkorn *
- numpy *
- pandas *
- open *