BGGM

BGGM: Bayesian Gaussian Graphical Models in R - Published in JOSS (2020)

https://github.com/donaldrwilliams/bggm

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

bayes-factors bayesian-hypothesis-testing gaussian-graphical-models

Scientific Fields

Sociology Social Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

Bayesian Gaussian Graphical Models

Basic Info
Statistics
  • Stars: 57
  • Watchers: 7
  • Forks: 17
  • Open Issues: 15
  • Releases: 3
Topics
bayes-factors bayesian-hypothesis-testing gaussian-graphical-models
Created over 7 years ago · Last pushed 5 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
bibliography: inst/REFERENCES.bib
---

```{r, echo = FALSE, message=F}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  dev = "png",
  dpi = 200,
  fig.align = "center",
  knitr::opts_chunk$set(comment = NA)
  )
library(ggplot2)
library(BGGM)
```




# BGGM: Bayesian Gaussian Graphical Models

[![CRAN Version](https://www.r-pkg.org/badges/version/BGGM)](https://cran.r-project.org/package=BGGM)
[![Downloads](https://cranlogs.r-pkg.org/badges/BGGM)](https://cran.r-project.org/package=BGGM)
[![build](https://github.com/donaldRwilliams/BGGM/actions/workflows/build.yml/badge.svg)](https://github.com/donaldRwilliams/BGGM/actions/workflows/build.yml)
[![status](https://joss.theoj.org/papers/10.21105/joss.02111/status.svg)](https://joss.theoj.org/papers/10.21105/joss.02111)

The `R` package **BGGM** provides tools for making Bayesian inference in 
Gaussian graphical models (GGM). The methods are organized around two general 
approaches for Bayesian inference: (1) estimation and (2) hypothesis 
testing. The key distinction is that the former focuses on either 
the posterior or posterior predictive distribution [@Gelman1996a; see section 5 in @rubin1984bayesianly], whereas the latter focuses on model comparison with the Bayes factor [@Jeffreys1961; @Kass1995].

## What is a Gaussian Graphical Model ?
A Gaussian graphical model captures conditional (in)dependencies among a set 
of variables. These are pairwise relations (partial correlations) controlling for 
the effects of all other variables in the model.

### Applications
The Gaussian graphical model is used across the sciences, including 
(but not limited to) economics [@millington2020partial], climate science 
[@zerenner2014gaussian], genetics [@chu2009graphical], and psychology [@rodriguez2020formalizing]. 


## Installation

To install the latest release version (`2.1.4`) from CRAN use
```{r gh-installation, eval = FALSE}	
install.packages("BGGM")	
```

The current developmental version can be installed with	

```{r, eval = FALSE}	
if (!requireNamespace("remotes")) {	
  install.packages("remotes")	
}	
remotes::install_github("donaldRwilliams/BGGM")
```

## Overview
The methods in **BGGM** build upon existing algorithms that are well-known in the literature.
The central contribution of **BGGM** is to extend those approaches:

1.  Bayesian estimation with the novel matrix-F prior distribution [@Mulder2018]
  
    + [Estimation](#bayesian-estimation) [@Williams2019]

2. Bayesian hypothesis testing with the matrix-F prior distribution [@Williams2019_bf]

    + [Exploratory hypothesis testing](#Exploratory)
  
    + [Confirmatory hypothesis testing](#Confirmatory)
    
3. Comparing Gaussian graphical models [@Williams2019; @williams2020comparing]
    
    + [Partial correlation differences](#partial-correlation-differences) 
    
    + [Posterior predictive check](#posterior-predictive-check)
    
    + [Exploratory hypothesis testing](#exploratory-groups) 
    
    + [Confirmatory hypothesis testing](#confirmatory-groups)

4. Extending inference beyond the conditional (in)dependence structure [@Williams2019]

    +  [Predictability](#Predictability) 
    
    +  [Posterior uncertainty intervals](#partial-correlation-differences) for the 
       partial correlations
       
    +  [Custom Network Statistics](#custom-network-statistics)
    
    
The computationally intensive tasks are written in `c++` via the `R` package **Rcpp** [@eddelbuettel2011rcpp] and the `c++` library **Armadillo** [@sanderson2016armadillo]. The Bayes factors are computed with the `R` package **BFpack** [@mulder2019bfpack]. Furthermore, there are [plotting](#example-network-plot) functions
for each method, control variables can be included in the model (e.g., `~ gender`), 
and there is support for missing values (see `bggm_missing`).

## Supported Data Types

* **Continuous**: The continuous method was described in  @Williams2019. Note that 
                  this is based on the customary [Wishart distribution](https://en.wikipedia.org/wiki/Wishart_distribution).

* **Binary**: The binary method builds directly upon @talhouk2012efficient
  that, in turn, built upon the approaches of @lawrence2008bayesian and
  @webb2008bayesian (to name a few).
  
* **Ordinal**: The ordinal methods require sampling thresholds. There are two approach 
   included in **BGGM**. The customary approach described in @albert1993bayesian 
   (the default) and the 'Cowles' algorithm described in @cowles1996accelerating.
   
* **Mixed**: The mixed data (a combination of discrete and continuous) method was introduced
 in @hoff2007extending. This is a semi-parametric copula model
 (i.e., a copula GGM) based on the ranked likelihood. Note that this can be used for 
 *only* ordinal data (not restricted to "mixed" data).

## Illustrative Examples
The following includes brief examples for *some* of the methods in **BGGM**.

### Bayesian Estimation

#### Posterior Sampling
An ordinal GGM is estimated with

```{r, eval = FALSE}
library(BGGM)
library(ggplot2)

# data
Y <- ptsd[,1:5] + 1

# ordinal
fit <- estimate(Y, type = "ordinal", 
                analytic = FALSE)
```

Notice the `+ 1`. This is required, because the first category must be `1` when `type = "ordinal"`. The partial correlations can the be summarized with

```{r, eval = FALSE}
summary(fit)

#> BGGM: Bayesian Gaussian Graphical Models 
#> --- 
#> Type: ordinal 
#> Analytic: FALSE 
#> Formula:  
#> Posterior Samples: 250 
#> Observations (n):
#> Nodes (p): 5 
#> Relations: 10 
#> --- 
#> Call: 
#> estimate(Y = Y, type = "ordinal", analytic = FALSE, iter = 250)
#> --- 
#> Estimates:
#>  Relation Post.mean Post.sd Cred.lb Cred.ub
#>    B1--B2     0.258   0.079   0.105   0.418
#>    B1--B3     0.028   0.086  -0.127   0.189
#>    B2--B3     0.517   0.058   0.406   0.616
#>    B1--B4     0.356   0.070   0.210   0.486
#>    B2--B4    -0.076   0.078  -0.219   0.063
#>    B3--B4     0.246   0.077   0.107   0.385
#>    B1--B5     0.131   0.080  -0.020   0.279
#>    B2--B5     0.127   0.083  -0.040   0.284
#>    B3--B5     0.202   0.079   0.063   0.366
#>    B4--B5     0.349   0.070   0.209   0.474
#> --- 
```

The returned object can also be plotted, which allows for visualizing the posterior uncertainty interval for each relation. An example is provided below in [Posterior uncertainty intervals](#posterior-uncertatiny). 
The partial correlation matrix is accessed with 

```{r, eval=FALSE}
pcor_mat(fit)
```

```{r, echo = FALSE, results='asis'}
load(file = "readme_models/pcor_ex.rda")
knitr::kable(pcor_ex, row.names = TRUE)
```

The graph is selected with
```{r, eval=FALSE}
E <- select(fit)
```

and then plotted
```{r, eval = FALSE}
# "communities"
comm <- substring(colnames(Y), 1, 1)

plot(select(fit), 
     groups = comm,
     edge_magnify = 5, 
     palette = "Pastel1", 
     node_size = 12)
```

![](readme_models/plt_est_net.png)

This basic "workflow" can be used with all methods and data types. A more involved network plot
is provided below.

#### Analytic
There is also an analytic solution that is based on the Wishart distribution. This 
simple solution provides competitive performance with "state-of-the-art" methods, 
assuming that *n* (observations) > *p* (variables). The one 
caveat is that it works only for `type = "continuous"` (the default).

```{r, eval=FALSE}
# analytic
fit <- estimate(Y, analytic = TRUE)

# network plot
plot(select(fit))
```

This is quite handy when (1) only the conditional dependence structure is of interest and (2) an immediate solution is desirable. An example of (2) is provided  in [Posterior Predictive Check](#posterior-predictive-check).


### Bayesian Hypothesis Testing The Bayes factor based methods allow for determining the conditional **in**dependence structure (evidence for the null hypothesis). #### Exploratory ```{r, eval=FALSE} # now 10 nodes Y <- ptsd[,1:10] # exploratory hypothesis testing fit<- explore(Y) # select E <- select(fit, alternative = "exhaustive") ``` The option `alternative = "exhaustive"` compares three hypotheses: (1) a null relation; (2) a positive relation; and (3) a negative relation. ```{r, eval = FALSE} summary(E) #> BGGM: Bayesian Gaussian Graphical Models #> --- #> Type: ordinal #> Alternative: exhaustive #> --- #> Call: #> select.explore(object = fit, alternative = "exhaustive") #> --- #> Hypotheses: #> H0: rho = 0 #> H1: rho > 0 #> H2: rho < 0 #> --- #> #> Relation Post.mean Post.sd Pr.H0 Pr.H1 Pr.H2 #> B1--B2 0.263 0.080 0.000 0.999 0.001 #> B1--B3 0.020 0.081 0.710 0.173 0.116 #> B2--B3 0.523 0.073 0.000 1.000 0.000 #> B1--B4 0.362 0.070 0.000 1.000 0.000 #> B2--B4 -0.082 0.068 0.459 0.061 0.480 #> B3--B4 0.252 0.073 0.000 1.000 0.000 #> B1--B5 0.129 0.072 0.120 0.847 0.033 #> B2--B5 0.118 0.078 0.223 0.726 0.051 #> B3--B5 0.213 0.077 0.001 0.996 0.003 #> B4--B5 0.348 0.072 0.000 1.000 0.000 ``` The posterior hypothesis probabilities are provided in the last three columns. When using `plot(E)`, there is a network plot for each hypothesis. #### Confirmatory A central contribution of **BGGM** is confirmatory hypothesis testing of (in)equality constraints [@Hoijtink2011]. By this we are referring to testing expectations, as opposed to feeding the data to, say, `estimate`, and seeing what happens to emerge. In this example, the focus is on suicidal thoughts (`PHQ9`) in a comorbidity network. Here is an example set of hypotheses ```{r} # data (+ 1) Y <- depression_anxiety_t1 + 1 # example hypotheses hyp <- c("PHQ2--PHQ9 > PHQ1--PHQ9 > 0; PHQ2--PHQ9 = PHQ1--PHQ9 = 0") ``` There are two hypotheses separated by (`;`). The first expresses that the relation `PHQ2--PHQ9` ("feeling down, depressed, or hopeless" and "suicidal thoughts") is larger than `PHQ1--PHQ9` ("little interest or pleasure in doing things" and "suicidal thoughts"). In other words, that the partial correlation is larger for `PHQ2--PHQ9`. There is an additional constraint to positive values (`> 0`) for both relations. The second hypothesis is then a "null" model. ```{r, echo=FALSE, warning = FALSE} load(file = "readme_models/fit_hyp1.rda") fit <- fit_hyp1 ``` ```{r, eval=FALSE} # (try to) confirm fit <- confirm(Y = Y, hypothesis = hyp, type = "ordinal") ``` The object `fit` is then printed ```{r, eval=FALSE} fit #> BGGM: Bayesian Gaussian Graphical Models #> Type: ordinal #> --- #> Posterior Samples: 250 #> Observations (n): 403 #> Variables (p): 16 #> Delta: 15 #> --- #> Call: #> confirm(Y = Y + 1, hypothesis = hyp, type = "ordinal", #> iter = 250) #> --- #> Hypotheses: #> #> H1: PHQ2--PHQ9>PHQ1--PHQ9>0 #> H2: PHQ2--PHQ9=PHQ1--PHQ9=0 #> H3: complement #> --- #> Posterior prob: #> #> p(H1|data) = 0.895 #> p(H2|data) = 0.002 #> p(H3|data) = 0.103 #> --- #> Bayes factor matrix: #> H1 H2 H3 #> H1 1.000 529.910 8.666 #> H2 0.002 1.000 0.016 #> H3 0.115 61.147 1.000 #> --- #> note: equal hypothesis prior probabilities ``` The posterior hypothesis probability is `0.895` which provides some evidence for the order constraint. The Bayes factor matrix then divides the posterior probabilities. This provide a measure of *relative* support for which hypothesis the data were more likely under. Finally, the results can be plotted ```{r, eval=FALSE} plot(fit) + scale_fill_brewer(palette = "Set2", name = "Posterior Prob") + ggtitle("Confirmatory: Comorbidity Network") ``` ![](readme_models/confirm_hyp.png) This demonstrates that all the `plot()` functions in **BGGM** return `ggplot` objects that can be further customized. Note that **BGGM** is not focused on making publication ready plots. Typically the bare minimum is provided that can then be honed in.
### Comparing Gaussian Graphical Models #### Partial Correlation Differences This method compares groups by computing the difference for each relation in the model. In other words, there are pairwise contrasts for each partial correlation, resulting in a posterior distribution for each difference. In all examples in this section, personality networks are compared for males and females. ```{r} # data Y <- bfi # males Ymales <- subset(Y, gender == 1, select = -c(gender, education)) # females Yfemales <- subset(Y, gender == 2, select = -c(gender, education)) ``` Fit the model ```{r, eval=FALSE} fit <- ggm_compare_estimate(Ymales, Yfemales) ``` Then plot the results, in this case the posterior distribution for each difference ```{r, eval=FALSE} # plot summary plot(summary(fit)) ``` ![](readme_models/ggm_compare_estimate.png) Note that it is also possible to use `select` for the object `fit` and then plot the results. This produces a network plot including the selected differences. Furthermore, it is also possible to plot the partial correlations (not the differences). This is accomplished by using `plot` with the summary computed from an `estimate` object ([see above](#bayesian-estimation)). #### Posterior Predictive Check The predictive check method uses Jensen-Shannon divergence (i.e., symmetric Kullback-Leibler divergence [Wikipedia](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)) and the sum of squared error (for the partial correlation matrices) to compare groups [@williams2020comparing]. The following compares the groups ```{r, eval=FALSE} fit <- ggm_compare_ppc(Ymales, Yfemales) ``` Then print the summary output with ```{r, eval=FALSE} fit #> BGGM: Bayesian Gaussian Graphical Models #> --- #> Test: Global Predictive Check #> Posterior Samples: 500 #> Group 1: 805 #> Group 2: 1631 #> Nodes: 25 #> Relations: 300 #> --- #> Call: #> ggm_compare_ppc(Ymales, Yfemales, iter = 500) #> --- #> Symmetric KL divergence (JSD): #> #> contrast JSD.obs p_value #> Yg1 vs Yg2 0.442 0 #> --- #> #> Sum of Squared Error: #> #> contrast SSE.obs p.value #> Yg1 vs Yg2 0.759 0 #> --- #> note: #> JSD is Jensen-Shannon divergence ``` In this case, there seems to be decisive evidence that the networks are different (as indicated by the posterior predictive *p*-value). The predictive distribution can also be plotted ```{r, eval=FALSE, out.width = '65%', warning = FALSE, message=FALSE} plot(fit, critical = 0.05)$plot_jsd ``` ![](readme_models/ppc_1.png) where the red region is the "critical" area and the black point is the observed KL divergence for the networks. This again shows that the "distance" between the networks is much more than expected, assuming that the groups were actually the same. This next example is a new feature in **BGGM** (`2.0.0`), that allows for comparing GGMs any way the user wants. All that is required is to (1) decide on a test-statistic and (2) write a custom function. Here is an example using Hamming distance ([Wikipedia](https://en.wikipedia.org/wiki/Hamming_distance)), which is essentially the squared error between adjacency matrices (a test for different structures). First define the custom function ```{r} f <- function(Yg1, Yg2){ # remove NA x <- na.omit(Yg1) y <- na.omit(Yg2) # nodes p <- ncol(x) # identity matrix I_p <- diag(p) # estimate graphs fit1 <- estimate(x, analytic = TRUE) fit2 <- estimate(y, analytic = TRUE) # select graphs sel1 <- select(fit1) sel2 <- select(fit2) # Hamming distance sum((sel1$adj[upper.tri(I_p)] - sel2$adj[upper.tri(I_p)])^2) } ``` Note that (1) `analytic = TRUE` is being used, which is needed in this case because two graphs are estimated for each iteration (or draw from the posterior predictive distribution) and (2) `f` requires two datasets as the input and returns a single number (the chosen test-statistic). The next step is to compute the observed Hamming distance ```{r, echo=FALSE, warning = FALSE} load(file = "readme_models/obs.rda") ``` ```{r, eval=FALSE} # observed difference obs <- f(Ymales, Yfemales) ``` then compare the groups ```{r, eval = FALSE} fit <- ggm_compare_ppc(Ymales, Yfemales, FUN = f, custom_obs = obs) # print fit #> BGGM: Bayesian Gaussian Graphical Models #> --- #> Test: Global Predictive Check #> Posterior Samples: 250 #> Group 1: 805 #> Group 2: 1631 #> Nodes: 25 #> Relations: 300 #> --- #> Call: #> ggm_compare_ppc(Ymales, Yfemales, iter = 250, FUN = f, custom_obs = obs) #> --- #> Custom: #> #> contrast custom.obs p.value #> Yg1 vs Yg2 75 0.576 #> --- ``` In this case, the *p*-value does not indicate that the groups are different for this test-statistic. This may seem contradictory to the previous results, but it is important to note that Hamming distance asks a much different question related to the adjacency matrices (no other information, such as edge weights, is considered). #### Exploratory (groups) The Bayes factor based methods allow for determining the conditional **in**dependence structure (evidence for the null hypothesis), in this case for group equality. Fit the model ```{r, eval=FALSE} fit <- ggm_compare_explore(Ymales, Yfemales) ``` Then plot the results ```{r, eval=FALSE} plot(summary(fit)) + theme_bw() + theme(panel.grid = element_blank(), axis.text.y = element_blank()) ``` ![](readme_models/plt_ggm_compare_explore.png) Here the posterior probability for a difference is visualized for each relation in the GGM. Note that it is also possible to use `select` for the object `fit` and then plot the results. This produces a network plot including the selected differences, in addition to a plot depicting the relations for which there was evidence for the null hypothesis. #### Confirmatory (groups) A central contribution of **BGGM** is confirmatory hypothesis testing of (in)equality constraints [@Hoijtink2011], in this case for comparing groups. By this we are referring to testing expectations, as opposed to feeding the data to, say, `estimate`, and seeing what happens to emerge. In this example, the focus is on agreeableness in a personality network. Here is a set of hypotheses ```{r} hyp <- c("g1_A2--A4 > g2_A2--A4 > 0 & g1_A4--A5 > g2_A4--A5 > 0; g1_A4--A5 = g2_A4--A5 = 0 & g1_A2--A4 = g2_A2--A4 = 0") ``` where the variables are `A2` ("inquire about others' well being"), `A4` ("love children"), and `A5` ("make people feel at ease"). The first hypothesis states that the conditionally dependent effects are larger for female than males (note the `&`), with the additional constraint to positive values, whereas the second hypothesis is a "null" model. The hypothesis is tested with the following ```{r, eval=FALSE} fit <- ggm_compare_confirm(Yfemales, Ymales, hypothesis = hyp) # print fit #> BGGM: Bayesian Gaussian Graphical Models #> Type: continuous #> --- #> Posterior Samples: 500 #> Group 1: 1631 #> Group 2: 805 #> Variables (p): 25 #> Relations: 300 #> Delta: 15 #> --- #> Call: #> ggm_compare_confirm(Yfemales, Ymales, hypothesis = hyp, iter = 500) #> --- #> Hypotheses: #> #> H1: g1_A2--A4>g2_A2--A4>0&g1_A4--A5>g2_A4--A5>0 #> H2: g1_A4--A5=g2_A4--A5=0&g1_A2--A4=g2_A2--A4=0 #> H3: complement #> --- #> Posterior prob: #> #> p(H1|data) = 0.989 #> p(H2|data) = 0 #> p(H3|data) = 0.011 #> --- #> Bayes factor matrix: #> H1 H2 H3 #> H1 1.000 1.180798e+14 92.115 #> H2 0.000 1.000000e+00 0.000 #> H3 0.011 1.281873e+12 1.000 #> --- #> note: equal hypothesis prior probabilities ``` The posterior hypothesis probability is 0.989 which provides strong evidence for the hypothesis that predicted these "agreeableness" relations would be larger in females than in males. This can also be plotted, as in [Confirmatory (one group)](#confirmatory). See @rodriguez2020formalizing for a full treatment of confirmatory testing in substantive applications. ### Beyond the Conditional (In)dependence Structure #### Predictability In this example, predictability is computed for each node in the network [see here for rationale @haslbeck2018well]. Currently **BGGM** computes Bayesian variance explained for all data types [@gelman_r2_2019]. The following computes predictability for binary data ```{r, eval=FALSE} # binary Y <- women_math # fit model fit <- estimate(Y, type = "binary") # compute r2 r2 <- predictability(fit, iter = 500) # plot plot(r2, type = "ridgeline") ``` ![](readme_models/predictability.png) #### Posterior Uncertainty See [Partial Correlation Differences](#partial-correlation-differences) #### Custom Network Statistics A new feature to **BGGM** allows for computing user defined network statistics, given a partial correlation or weighted adjacency matrix. Here is an example for bridge centrality [@jones2019bridge]. The first step is to define the function ```{r, eval=FALSE} # need this package library(networktools) # custom function f <- function(x, ...){ bridge(x, ...)$`Bridge Strength` } ``` Note that `x` takes the matrix and `f` can return either a single number or a number for each node. The next step is to fit the model and compute the network statistic ```{r, eval=FALSE} # data Y <- ptsd # clusters communities <- substring(colnames(Y), 1, 1) # estimate the model fit <- estimate(Y) # bridge strength net_stat <- roll_your_own(fit, FUN = f, select = TRUE, communities = communities) ``` The function `f` is provided to `FUN` and `communities` is passed to `brigde` (inside of `f`) via `...`. The results can be printed ```{r, eval=FALSE} # print net_stat #> BGGM: Bayesian Gaussian Graphical Models #> --- #> Network Stats: Roll Your Own #> Posterior Samples: 100 #> --- #> Estimates: #> #> Node Post.mean Post.sd Cred.lb Cred.ub #> 1 0.340 0.097 0.166 0.546 #> 2 0.319 0.100 0.176 0.513 #> 3 0.000 0.000 0.000 0.000 #> 4 0.337 0.086 0.189 0.489 #> 5 0.559 0.133 0.332 0.791 #> 6 0.188 0.073 0.029 0.320 #> 7 0.505 0.138 0.241 0.781 #> 8 0.153 0.070 0.022 0.286 #> 9 0.175 0.063 0.041 0.281 #> 10 0.000 0.000 0.000 0.000 #> 11 0.365 0.107 0.178 0.627 #> 12 0.479 0.093 0.280 0.637 #> 13 0.155 0.074 0.022 0.301 #> 14 0.000 0.000 0.000 0.000 #> 15 0.374 0.097 0.175 0.550 #> 16 0.174 0.065 0.034 0.295 #> 17 0.000 0.000 0.000 0.000 #> 18 0.491 0.132 0.238 0.745 #> 19 0.613 0.113 0.408 0.825 #> 20 0.144 0.066 0.038 0.289 #> --- ``` And then plotted ```{r, eval = FALSE} plot(net_stat) ``` ![](readme_models/bridge.png) There are additional examples in the documentation. ### Example Network Plot Here is an example of a more involved network plot. In this case, the graph is estimated with a semi-parametric copula (`type = "mixed"`), where two control variables are included in the model. ```{r, echo=FALSE} load(file = "readme_models/fit_plt_ex.rda") fit <- fit_plt_ex # select graph E <- select(fit) ``` ```{r, eval=FALSE} # personality (includes gender and education) Y <- bfi # fit copula GGM fit <- estimate(Y, type = "mixed") # select graph E <- select(fit) ``` The graph is then plotted ```{r, eval=FALSE } # extract communities comm <- substring(colnames(Y), 1, 1) # plot plot(E, # enlarge edges edge_magnify = 5, # cluster nodes groups = comm, # change layout layout = "circle")$plt + # plot title ggtitle("Semi-Parametric Copula") + # add custom labels scale_color_brewer(breaks = c("A", "C", "E", "N", "O", "e", "g"), labels = c("A", "C", "E", "N", "O", "Education", "Gender"), palette = "Set2") ``` ![](readme_models/plt_net_example.png) Note that `layout` can be changed to any option provided in the `R` package **sna** [@sna]. ## Additional Features The primary focus of **BGGM** is Gaussian graphical modeling (the inverse covariance matrix). The residue is a suite of useful methods not explicitly for GGMs. For example, ### Bivariate Correlations Bivariate correlations for `binary` (tetrachoric), `ordinal` (polychoric), `mixed` (rank based), and `continuous` (Pearson's) data. Here is an example for computing tetrachoric correlations: ```{r, echo=FALSE} load(file = "readme_models/binary_cors.rda") ``` ```{r, eval=FALSE} # binary data Y <- women_math[1:500,] cors <- zero_order_cors(Y, type = "binary", iter = 250) cors$R ``` ```{r, echo = FALSE, results='asis'} row.names(cors$R_mean) <- 1:6 knitr::kable(round(cors$R_mean, 3), col.names = c("1", "2", "3","4","5", "6"), row.names = TRUE) ``` The object `cors` also includes the sampled correlation matrices (in this case 250) in an array. ### Multivariate Regression Multivariate regression for binary (probit), ordinal (probit), mixed (rank likelihood), and continuous data. Here is an example for a multivariate probit model with an ordinal outcome, where `E5` ("take charge") and `N5` ("panic easily") are predicted by `gender` and `education`: ```{r,echo=FALSE} load("readme_models/mv_probit.rda") ``` ```{r, eval = F} # personality data Y <- bfi # variables Y <- subset(Y, select = c("E5", "N5", "gender", "education")) mv_probit <- estimate(Y, formula = ~ gender + as.factor(education), type = "ordinal") ``` Note that **BGGM** does not use the customary `model.matrix` formulation. This is for good reason, as each variable in the GGM does not need to be written out. Here we effectively "tricked" **BGGM** to fit a multivariate probit model (each variable included in `formula` is removed from `Y`). ```{r} regression_summary(mv_probit) ``` This basic idea can also be used to fit regression models with a single outcome. ## Note on Conditional (In)dependence Models for Latent Data All of the data types (besides continuous) model latent data. That is, unobserved data that is assumed to be Gaussian distributed. For example, a tetrachoric correlation (binary data) is a special case of a polychoric correlation (ordinal data). Both relations are between "theorized normally distributed continuous *latent* variables [Wikepedia](https://en.wikipedia.org/wiki/Polychoric_correlation). In both instances, the corresponding partial correlation between observed variables is conditioned on the remaining variables in the *latent* space. This implies that interpretation is similar to continuous data, but with respect to latent variables. We refer interested users to [see page 2364, section 2.2, in @webb2008bayesian]. ## High Dimensional Data? **BGGM** was built specifically for social-behavioral scientists. Of course, the methods can be used by all researchers. However, there is currently *not* support for high-dimensional data (i.e., more variables than observations) that are common place in, say, the genetics literature. These data are rare in the social-behavioral sciences. In the future, support for high-dimensional data may be added to **BGGM**. ## Bug Reports, Feature Requests, and Contributing Bug reports and feature requests can be made by opening an issue on [Github](https://github.com/donaldRwilliams/BGGM/issues). To contribute towards the development of **BGGM**, you can start a branch with a pull request and we can discuss the proposed changes there. ## Comparison to Other Software **BGGM** is the only `R` package to implement all of these algorithms and methods. The `mixed` data approach is also implemented in the package **sbgcop** [base `R`, @hoff2007extending]. The `R` package **BDgraph** implements a Gaussian copula graphical model in `c++` [@mohammadi2015bdgraph], but not the binary or ordinal approaches. Furthermore, **BGGM** is the only package for confirmatory testing and comparing graphical models with the methods described in @williams2020comparing. ## References

Owner

  • Name: Donald R. Williams
  • Login: donaldRwilliams
  • Kind: user

JOSS Publication

BGGM: Bayesian Gaussian Graphical Models in R
Published
July 21, 2020
Volume 5, Issue 51, Page 2111
Authors
Donald R. Williams
Department of Psychology, University of California, Davis
Joris Mulder
Department of Methodology and Statistics, Tilburg University
Editor
Anisha Keshavan ORCID
Tags
Gaussian graphical models Bayesian Bayes factor partial correlation

Papers & Mentions

Total mentions: 2

Psychopathological symptom network structure in transgender and gender queer youth reporting parental psychological abuse: a network analysis
Last synced: 3 months ago
A network perspective on suicidal behavior: Understanding suicidality as a complex system
Last synced: 3 months ago

GitHub Events

Total
  • Create event: 1
  • Issues event: 2
  • Release event: 1
  • Watch event: 3
  • Issue comment event: 2
  • Push event: 13
  • Pull request event: 3
  • Fork event: 3
Last Year
  • Create event: 1
  • Issues event: 2
  • Release event: 1
  • Watch event: 3
  • Issue comment event: 2
  • Push event: 13
  • Pull request event: 3
  • Fork event: 3

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 1,062
  • Total Committers: 5
  • Avg Commits per committer: 212.4
  • Development Distribution Score (DDS): 0.108
Past Year
  • Commits: 26
  • Committers: 2
  • Avg Commits per committer: 13.0
  • Development Distribution Score (DDS): 0.077
Top Committers
Name Email Commits
Donald R. Williams b****9@g****m 947
Philippe Rast r****h@g****m 81
donaldRwilliams y****u@e****m 16
Stephen Martin s****n@g****m 16
mihaiconstantin m****i@m****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 58
  • Total pull requests: 39
  • Average time to close issues: 3 months
  • Average time to close pull requests: 2 months
  • Total issue authors: 30
  • Total pull request authors: 3
  • Average comments per issue: 3.81
  • Average comments per pull request: 0.18
  • Merged pull requests: 34
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 1
  • Average time to close issues: 10 minutes
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • paulgovan (9)
  • donaldRwilliams (6)
  • mjbsp (4)
  • matticervin (4)
  • tsbaguley (3)
  • fdabl (3)
  • jayrobwilliams (2)
  • xinkaidupsy (2)
  • FMTurner (2)
  • michellehunsche (2)
  • mikaelr88 (2)
  • isabelslurink (1)
  • andrewpython (1)
  • QMmmmLiu (1)
  • paradeisios (1)
Pull Request Authors
  • donaldRwilliams (36)
  • mihaiconstantin (2)
  • stephensrmmartin (2)
  • vira-dvoriak (1)
Top Labels
Issue Labels
JOSS_review (11) enhancement (8) good first issue (2) bug (1) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 653 last-month
  • Total docker downloads: 20,358
  • Total dependent packages: 5
  • Total dependent repositories: 3
  • Total versions: 11
  • Total maintainers: 1
cran.r-project.org: BGGM

Bayesian Gaussian Graphical Models

  • Versions: 11
  • Dependent Packages: 5
  • Dependent Repositories: 3
  • Downloads: 653 Last month
  • Docker Downloads: 20,358
Rankings
Docker downloads count: 1.2%
Forks count: 5.2%
Stargazers count: 6.5%
Average: 8.3%
Downloads: 9.7%
Dependent packages count: 10.6%
Dependent repos count: 16.8%
Maintainers (1)
Last synced: 4 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • BFpack >= 0.2.1 imports
  • GGally >= 1.4.0 imports
  • MASS >= 7.3 imports
  • Rcpp >= 1.0.4.6 imports
  • Rdpack >= 0.11 imports
  • ggplot2 >= 3.2.1 imports
  • ggridges >= 0.5.1 imports
  • grDevices * imports
  • methods * imports
  • mvnfast >= 0.2.5 imports
  • network >= 1.15 imports
  • reshape >= 0.8.8 imports
  • sna >= 2.5 imports
  • stats * imports
  • utils * imports
  • abind >= 1.4 suggests
  • assortnet >= 0.12 suggests
  • knitr * suggests
  • mice >= 3.8.0 suggests
  • networktools >= 1.3.0 suggests
  • psych * suggests
  • rmarkdown * suggests
.github/workflows/build.yml actions
  • actions/cache v1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite