Generative DAGs as an Interface Into Probabilistic Programming with the R Package causact
Generative DAGs as an Interface Into Probabilistic Programming with the R Package causact - Published in JOSS (2022)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
✓Committers with academic emails
4 of 6 committers (66.7%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
bayesian-inference
dags
posterior-probability
probabilistic-graphical-models
probabilistic-programming
rstats-package
Scientific Fields
Sociology
Social Sciences -
87% confidence
Last synced: 4 months ago
·
JSON representation
Repository
causact: R package to accelerate computational Bayesian inference workflows in R through interactive visualization of models and their output.
Basic Info
- Host: GitHub
- Owner: flyaflya
- License: other
- Language: R
- Default Branch: master
- Homepage: http://causact.com
- Size: 12.8 MB
Statistics
- Stars: 48
- Watchers: 3
- Forks: 12
- Open Issues: 9
- Releases: 12
Topics
bayesian-inference
dags
posterior-probability
probabilistic-graphical-models
probabilistic-programming
rstats-package
Created over 7 years ago
· Last pushed 4 months ago
Metadata Files
Readme
Changelog
Contributing
License
Code of conduct
README.Rmd
---
output: github_document
editor_options:
markdown:
wrap: 72
---
```{r, echo=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/",
dpi = 600,
fig.asp = 0.618,
fig.width = 6
)
```
[](https://github.com/flyaflya/causact/actions)
[](https://cran.r-project.org/package=causact)
[](https://app.codecov.io/gh/flyaflya/causact?branch=master)
[](https://img.shields.io/badge/Lifecycle-Maturing-007EC6)
[](https://doi.org/10.5281/zenodo.6949489)
[](https://doi.org/10.21105/joss.04415)
# causact
*Accelerate computational Bayesian inference workflows* in R through
interactive modelling, visualization, and inference. The package
leverages directed acyclic graphs (DAGs) to create a unified language
language for business stakeholders, statisticians, and programmers. Due
to its visual nature and simple model construction, `causact` serves as
a great entry-point for newcomers to computational Bayesian inference.
> The `causact` package offers robust support for both foundational and
> advanced Bayesian models. While introductory models are well-covered,
> the utilization of multi-variate distributions such as multi-variate
> normal, multi-nomial, or dirichlet distributions, may not work as
> expected. There are ongoing enhancements in the pipeline to facilitate
> construction of these more intricate models.
```{r demoGif, out.width = "40%", echo = FALSE, fig.align = "center"}
knitr::include_graphics("man/figures/introScreenshot.png")
```
While proficiency in R is the only requirement for users of this
package, it also functions as a introductory probabilistic programming
language, seamlessly incorporating the `numpyro` Python package to
facilitate Bayesian inference without the need to learn any syntax
outside of the package or R. Furthermore, the package streamlines the
process of indexing categorical variables, which often presents a
complex syntax hurdle for those new to computational Bayesian methods.
For enhanced learning, the `causact` package for Bayesian inference is
featured in `A Business Analyst's Introduction to Business Analytics`
available at .
> Feedback and encouragement is appreciated via github issues.
## Installation Guide
To install the `causact` package, follow the steps outlined below:
### CRAN Release Version (Recommended)
For the current stable release, which is tailored to integrate with
Python's `numpyro` package, employ the following command:
``` r
install.packages("causact")
```
Then, see [Essential Dependencies] if you want to be able to automate
sampling using the `numpyro` package.
### Development Release
If you want the most recent development version (not recommended),
execute the following:
``` r
install.packages("remotes")
remotes::install_github("flyaflya/causact")
```
### Essential Dependencies
To harness the full potential of `causact` for DAG visualization and
Bayesian posterior analysis, it's vital to ensure proper integration
with the `numpyro` package. Given the Python-based nature of `numpyro`,
a few essential dependencies must be in place. Execute the following
commands after installing `causact`:
``` r
library(causact)
install_causact_deps()
```
If prompted, respond with `Y` to any inquiries related to installing miniconda.
> **Note**: If opting for installation on Posit Cloud, temporarily
> adjust your project's RAM to 4GB during the installation process
> (remember to APPLY CHANGES). This preemptive measure helps avoid
> encountering an
> `Error: Error creating conda environment [exit code 137]`. After
> installation, feel free to revert the settings to 1GB of RAM.
> **Note**: The September 11, 2023 release of `reticulate` (`v1.32`) has
> caused an issue which gives a
> `TypeError: the first argument must be callable` error when using
> `dag_numpyro()` on windows. If you experience this, install the dev
> version of `reticulate` by following the below steps:
>
> 1) Install RTOOLS by using installer at:
>
>
> 2) Run this to get the dev version of `reticulate`:
```
# install DEV version of reticulate
# install.packages("pak") #uncomment as needed
pak::pak("rstudio/reticulate")
```
### Retrograde Compatibility (Not Advised)
In cases where legacy compatibility is paramount and you still rely on
the operationality of the `dag_greta()` function, consider installing
`v0.4.2` of the `causact` package. However, it's essential to emphasize
that this approach is **not recommended** for general usage:
``` r
### EXERCISE CAUTION BEFORE EXECUTING THESE LINES
### Only proceed if dag_greta() is integral to your existing codebase
install.packages("remotes")
remotes::install_github("flyaflya/causact@v0.4.2")
```
Your judicious choice of installation method will ensure a seamless and
effective integration of the `causact` package into your computational
toolkit.
## Usage
Example taken from
with the packages `dag_foo()` functions further described here:
### Create beautiful model visualizations.
```{r defineGraph, results = "hide", echo = FALSE}
library(causact)
graph = dag_create() %>%
dag_node(descr = "Get Card",label = "y",
rhs = bernoulli(theta),
data = carModelDF$getCard) %>%
dag_node(descr = "Card Probability",label = "theta",
rhs = beta(2,2),
child = "y") %>%
dag_plate(descr = "Car Model", label = "x",
data = carModelDF$carModel,
nodeLabels = "theta",
addDataNode = TRUE)
```
```{r cardP, eval = FALSE}
library(causact)
graph = dag_create() %>%
dag_node(descr = "Get Card", label = "y",
rhs = bernoulli(theta),
data = carModelDF$getCard) %>%
dag_node(descr = "Card Probability", label = "theta",
rhs = beta(2,2),
child = "y") %>%
dag_plate(descr = "Car Model", label = "x",
data = carModelDF$carModel,
nodeLabels = "theta",
addDataNode = TRUE)
graph %>% dag_render()
```
```{r cardPlatePlot, out.width = "60%", echo = FALSE}
knitr::include_graphics("man/figures/cardPlot.png")
```
### Hide model complexity, as appropriate, from domain experts and other less statistically minded stakeholders.
```{r cardPSL, eval = FALSE}
graph %>% dag_render(shortLabel = TRUE)
```
```{r cardPlateSLPlot, out.width = "50%", echo = FALSE}
knitr::include_graphics("man/figures/cardPlotShortLabel.png")
```
### Get posterior while automatically running the underlying `numpyro` code
```{r greta, warning = FALSE, message = FALSE}
drawsDF = graph %>% dag_numpyro()
drawsDF ### see top of data frame
```
> **Note**: if you have used older versions of causact, please know that
> dag_numpyro() is a drop-in replacement for dag_greta().
### Get quick view of posterior distribution
```{r gretaPost, fig.cap = "Credible interval plots.", fig.width = 6.5, fig.height = 4, out.width = "70%"}
drawsDF %>% dagp_plot()
```
### OPTIONAL: See `numpyro` code without executing it (for debugging or learning)
```{r gretaCode, warning = FALSE, message = TRUE}
numpyroCode = graph %>% dag_numpyro(mcmc = FALSE)
```
## Getting Help and Suggesting Improvements
Whether you encounter a clear bug, have a suggestion for improvement, or
just have a question, we are thrilled to help you out. In all cases,
please file a [GitHub
issue](https://github.com/flyaflya/causact/issues). If reporting a bug,
please include a minimal reproducible example.
## Contributing
We welcome help turning `causact` into the most intuitive and fastest
method of converting stakeholder narratives about data-generating
processes into actionable insight from posterior distributions. If you
want to help us achieve this vision, we welcome your contributions after
reading the [new contributor
guide](https://github.com/flyaflya/causact/blob/master/.github/contributing.md).
Please note that this project is released with a [Contributor Code of
Conduct](https://github.com/flyaflya/causact/blob/master/CODE_OF_CONDUCT.md).
By participating in this project you agree to abide by its terms.
## Further Usage
For more info, see
`A Business Analyst's Introduction to Business Analytics` available at
. You can also check out the package's
vignette: `vignette("narrative-to-insight-with-causact")`. Two
additional examples are shown below.
## Prosocial Chimpanzees Example from Statistical Rethinking
> McElreath, Richard. Statistical rethinking: A Bayesian course with
> examples in R and Stan. Chapman and Hall/CRC, 2018.
```{r chimpsGraph, results ="hide", warning = FALSE, message = FALSE}
library(tidyverse)
library(causact)
# data object used below, chimpanzeesDF, is built-in to causact package
graph = dag_create() %>%
dag_node("Pull Left Handle","L",
rhs = bernoulli(p),
data = causact::chimpanzeesDF$pulled_left) %>%
dag_node("Probability of Pull", "p",
rhs = 1 / (1 + exp(-((alpha + gamma + beta)))),
child = "L") %>%
dag_node("Actor Intercept","alpha",
rhs = normal(alphaBar, sigma_alpha),
child = "p") %>%
dag_node("Block Intercept","gamma",
rhs = normal(0,sigma_gamma),
child = "p") %>%
dag_node("Treatment Intercept","beta",
rhs = normal(0,0.5),
child = "p") %>%
dag_node("Actor Population Intercept","alphaBar",
rhs = normal(0,1.5),
child = "alpha") %>%
dag_node("Actor Variation","sigma_alpha",
rhs = exponential(1),
child = "alpha") %>%
dag_node("Block Variation","sigma_gamma",
rhs = exponential(1),
child = "gamma") %>%
dag_plate("Observation","i",
nodeLabels = c("L","p")) %>%
dag_plate("Actor","act",
nodeLabels = c("alpha"),
data = chimpanzeesDF$actor,
addDataNode = TRUE) %>%
dag_plate("Block","blk",
nodeLabels = c("gamma"),
data = chimpanzeesDF$block,
addDataNode = TRUE) %>%
dag_plate("Treatment","trtmt",
nodeLabels = c("beta"),
data = chimpanzeesDF$treatment,
addDataNode = TRUE)
```
### See graph
```{r chimpsGraphRenderCode, eval = FALSE, warning = FALSE, message = FALSE}
graph %>% dag_render(width = 2000, height = 800)
```
```{r chimpsGraphRenderPlot, out.width = "120%", echo = FALSE}
knitr::include_graphics("man/figures/chimpStat.png")
```
### Communicate with stakeholders for whom the statistics might be distracting
```{r chimpsGraphRenderSL, eval=FALSE, message=FALSE, warning=FALSE}
graph %>% dag_render(shortLabel = TRUE)
```
```{r chimpsGraphRenderPlotSL, out.width = "100%", echo = FALSE}
knitr::include_graphics("man/figures/chimpStatSL.png")
```
### Compute posterior
```{r chimpsGraphGreta, warning = FALSE, message = FALSE}
drawsDF = graph %>% dag_numpyro()
```
### Visualize posterior
```{r chimpsGraphPost, out.width = "100%", fig.width = 9, fig.height = 6, warning = FALSE, message = FALSE}
drawsDF %>% dagp_plot()
```
## Eight Schools Example from Bayesian Data Analysis
> Gelman, Andrew, Hal S. Stern, John B. Carlin, David B. Dunson, Aki
> Vehtari, and Donald B. Rubin. Bayesian data analysis. Chapman and
> Hall/CRC, 2013.
```{r eightschoolsGraph, results ="hide", warning = FALSE, message = FALSE}
library(tidyverse)
library(causact)
# data object used below, schoolDF, is built-in to causact package
graph = dag_create() %>%
dag_node("Treatment Effect","y",
rhs = normal(theta, sigma),
data = causact::schoolsDF$y) %>%
dag_node("Std Error of Effect Estimates","sigma",
data = causact::schoolsDF$sigma,
child = "y") %>%
dag_node("Exp. Treatment Effect","theta",
child = "y",
rhs = avgEffect + schoolEffect) %>%
dag_node("Pop Treatment Effect","avgEffect",
child = "theta",
rhs = normal(0,30)) %>%
dag_node("School Level Effects","schoolEffect",
rhs = normal(0,30),
child = "theta") %>%
dag_plate("Observation","i",nodeLabels = c("sigma","y","theta")) %>%
dag_plate("School Name","school",
nodeLabels = "schoolEffect",
data = causact::schoolsDF$schoolName,
addDataNode = TRUE)
```
### See graph
```{r eightschoolsGraphRenderCode, eval = FALSE, warning = FALSE, message = FALSE}
graph %>% dag_render()
```
```{r eightschoolsGraphRenderPlot, out.width = "100%", echo = FALSE}
knitr::include_graphics("man/figures/eightSchoolStat.png")
```
### Compute posterior
```{r eightschoolsGraphGreta, warning = FALSE, message = FALSE}
drawsDF = graph %>% dag_numpyro()
```
### Visualize posterior
```{r eightschoolsGraphPost, out.width = "100%", eval = FALSE, fig.width = 9, fig.height = 6, warning = FALSE, message = FALSE}
drawsDF %>% dagp_plot()
```
```{r eightschoolsGraphRenderPost, out.width = "100%", echo = FALSE}
knitr::include_graphics("man/figures/eightschoolsGraphPost-1.png")
```
Owner
- Name: Adam Fleischhacker
- Login: flyaflya
- Kind: user
- Company: Unviersity of Delaware
- Website: http://sites.udel.edu/ajf
- Repositories: 1
- Profile: https://github.com/flyaflya
JOSS Publication
Generative DAGs as an Interface Into Probabilistic Programming with the R Package causact
Published
August 12, 2022
Volume 7, Issue 76, Page 4415
Authors
Adam J. Fleischhacker
Adam Fleischhacker, JP Morgan Chase Faculty Fellow, University of Delaware, Newark, DE 19716
Adam Fleischhacker, JP Morgan Chase Faculty Fellow, University of Delaware, Newark, DE 19716
Thi Hong Nhung Nguyen
Institute for Financial Services Analytics, University of Delaware, Newark, DE 19716
Institute for Financial Services Analytics, University of Delaware, Newark, DE 19716
Tags
Bayesian inference probabilistic programming graphical models directed acyclic graphsGitHub Events
Total
- Create event: 1
- Release event: 1
- Issues event: 7
- Watch event: 5
- Issue comment event: 7
- Push event: 9
Last Year
- Create event: 1
- Release event: 1
- Issues event: 7
- Watch event: 5
- Issue comment event: 7
- Push event: 9
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| flyaflya | a****f@u****u | 296 |
| Adam Fleischhacker | a****f@B****u | 60 |
| JaredWS | j****s@u****u | 8 |
| Joe Thorley | j****e@p****a | 1 |
| DavisVaughan | d****s@r****m | 1 |
| DaniDapena[A | m****a@u****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 50
- Total pull requests: 17
- Average time to close issues: 4 months
- Average time to close pull requests: 2 months
- Total issue authors: 10
- Total pull request authors: 10
- Average comments per issue: 1.36
- Average comments per pull request: 1.18
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 0
- Average time to close issues: 3 days
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 0
- Average comments per issue: 1.67
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- flyaflya (31)
- joethorley (8)
- prabinov42 (3)
- teunbrand (1)
- akgandhi (1)
- marcora (1)
- zhiiiyang (1)
- ravivaidazad (1)
- jaredws (1)
Pull Request Authors
- DaniDapena (5)
- jaredws (4)
- idiotyr (1)
- nlp-submission (1)
- hhhhhzhang (1)
- joethorley (1)
- Farlein (1)
- mingchuan123 (1)
- Rosenguyen0411 (1)
- DavisVaughan (1)
Top Labels
Issue Labels
enhancement (6)
help wanted (3)
good first issue (2)
bug (1)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 515 last-month
- Total docker downloads: 41,971
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 12
- Total maintainers: 1
cran.r-project.org: causact
Fast, Easy, and Visual Bayesian Inference
- Homepage: https://github.com/flyaflya/causact
- Documentation: http://cran.r-project.org/web/packages/causact/causact.pdf
- License: MIT + file LICENSE
-
Latest release: 0.5.8
published 4 months ago
Rankings
Forks count: 6.5%
Stargazers count: 10.3%
Average: 22.0%
Downloads: 28.0%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Maintainers (1)
Last synced:
4 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.2.0 depends
- DiagrammeR >= 1.0.7 imports
- coda >= 0.19.3 imports
- cowplot >= 1.0.0 imports
- dplyr >= 0.8.5 imports
- forcats >= 0.5.0 imports
- ggplot2 >= 3.3.0 imports
- greta >= 0.3.1 imports
- htmlwidgets >= 1.5.1 imports
- igraph >= 1.2.5 imports
- lifecycle * imports
- magrittr >= 1.5 imports
- purrr >= 0.3.4 imports
- rlang >= 0.4.6 imports
- rstudioapi >= 0.11 imports
- stringr >= 1.4.0 imports
- tidyr >= 1.0.3 imports
- covr * suggests
- knitr * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests