GGMnonreg
GGMnonreg: Non-Regularized Gaussian Graphical Models in R - Published in JOSS (2021)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Sociology
Social Sciences -
40% confidence
Last synced: 4 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: donaldRwilliams
- License: gpl-2.0
- Language: R
- Default Branch: master
- Size: 1.1 MB
Statistics
- Stars: 7
- Watchers: 1
- Forks: 6
- Open Issues: 3
- Releases: 0
Created over 7 years ago
· Last pushed almost 2 years ago
Metadata Files
Readme
License
README.Rmd
---
output: github_document
bibliography: inst/REFERENCES.bib
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "75%"
)
```
# GGMnonreg: Non-regularized Gaussian Graphical Models
[](https://cran.r-project.org/package=GGMnonreg)
[](https://cran.r-project.org/package=GGMnonreg)
[](https://circleci.com/gh/donaldRwilliams/GGMnonreg)
[](https://doi.org/10.5281/zenodo.5668161)
The goal of **GGMnonreg** is to estimate non-regularized graphical models. Note that
the title is a bit of a misnomer, in that Ising and mixed graphical models are also supported.
Graphical modeling is quite common in fields with *wide* data, that is, when there are more
variables than observations. Accordingly, many regularization-based approaches have been developed for those kinds of data. There are key drawbacks of regularization when the goal is inference,
including, but not limited to, the fact that obtaining a valid measure of parameter uncertainty is very (very) difficult.
More recently, graphical modeling has emerged in psychology [@Epskamp2018ggm], where the data
is typically long or low-dimensional [*p* < *n*; @williams2019nonregularized; @williams_rethinking]. The primary purpose of **GGMnonreg** is to provide methods specifically for low-dimensional data
(e.g., those common to psychopathology networks).
## Supported Models
* Gaussian graphical model. The following data types are supported.
+ Gaussian
+ Ordinal
+ Binary
* Ising model [@marsman_2018]
* Mixed graphical model
## Additional methods
The following are also included
* Expected network replicability [@williams2020learning]
* Compare Gaussian graphical models
* Measure of parameter uncertainty [@williams2019nonregularized]
* Edge inclusion "probabilities"
* Network visualization
* Constrained precision matrix (the network, given an assumed graph)
* Predictability (variance explained)
## Installation
To install the latest release version (1.1.0) from CRAN use
```r
install.packages("GGMnonreg")
```
You can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("donaldRwilliams/GGMnonreg")
```
## Ising
An Ising model is fitted with the following
```{r}
library(GGMnonreg)
# make binary
Y <- ifelse(ptsd[,1:5] == 0, 0, 1)
# fit model
fit <- ising_search(Y, IC = "BIC",
progress = FALSE)
fit
```
Note the same code, more or less, is also used for GGMs and mixed graphical models.
## Predictability
It is common to compute predictability, or variance explained, for each node in the network.
An advantage of **GGMnonreg** is that a measure of uncertainty is also provided.
```{r}
# data
Y <- na.omit(bfi[,1:5])
# fit model
fit <- ggm_inference(Y, boot = FALSE)
# predictability
predictability(fit)
```
## Parameter Uncertainty
Confidence intervals for each relation are obtained with
```{r}
# data
Y <- na.omit(bfi[,1:5])
# fit model
fit <- ggm_inference(Y, boot = TRUE,
method = "spearman",
B = 100, progress = FALSE)
confint(fit)
```
These can then be plotted with, say, **ggplot2** (left to the user).
## Edge Inclusion
When mining data, or performing an automatic search, it is difficult to make inference on the
network parameters (e.g., confidence are not easily computed). To summarize data mining,
**GGMnonreg** provides edge inclusion "probabilities" (proportion bootstrap samples for
which each relation was detected).
```{r}
# data
Y <- na.omit(bfi[,1:5])
# fit model
fit <- eip(Y, method = "spearman",
B = 100, progress = FALSE)
fit
```
Note in all cases, the provided estimates correspond to the upper-triangular elements
of the network.
## Expected Network Replicability
**GGMnonreg** allows for computing expected network replicability (ENR), i.e., the number of
effects that will be detected in any number of replications. This is an analytic solution.
The first step is defining a true network
```{r}
# first make the true network
cors <- cor(GGMnonreg::ptsd)
# inverse
inv <- solve(cors)
# partials
pcors <- -cov2cor(inv)
# set values to zero
pcors <- ifelse(abs(pcors) < 0.05, 0, pcors)
```
Then obtain ENR
```{r}
fit_enr <- enr(net = pcors, n = 500, replications = 2)
fit_enr
```
Note this is inherently frequentist. As such, over the long run, 45 % of the edges will be replicated on average. Then we can further infer that, in hypothetical replication attempts, more than half of the edges
will be replicated only 5 % of the time.
ENR can also be plotted
```{r}
plot_enr(fit_enr)
```
### Intuition
Here is the basic idea of ENR
```{r}
# location of edges
index <- which(pcors[upper.tri(diag(20))] != 0)
# convert network into correlation matrix
diag(pcors) <- 1
cors_new <- corpcor::pcor2cor(pcors)
# replicated edges
R <- NA
# increase 1000 to, say, 5,000
for(i in 1:1000){
# two replications
Y1 <- MASS::mvrnorm(500, rep(0, 20), cors_new)
Y2 <- MASS::mvrnorm(500, rep(0, 20), cors_new)
# estimate network 1
fit1 <- ggm_inference(Y1, boot = FALSE)
# estimate network 2
fit2 <- ggm_inference(Y2, boot = FALSE)
# number of replicated edges (detected in both networks)
R[i] <- sum(
rowSums(
cbind(fit1$adj[upper.tri(diag(20))][index],
fit2$adj[upper.tri(diag(20))][index])
) == 2)
}
```
Notice that replication of two networks is being assessed over the long run. In other words,
if we draw two random samples, what is the expected replicability.
Compare analytic to simulation
```{r}
# combine simulation and analytic
cbind.data.frame(
data.frame(simulation = sapply(seq(0, 0.9, 0.1), function(x) {
mean(R > round(length(index) * x) )
})),
data.frame(analytic = round(fit_enr$cdf, 3))
)
# average replicability (simulation)
mean(R / length(index))
# average replicability (analytic)
fit_enr$ave_pwr
```
ENR works with any correlation, assuming there is an estimate of the standard error.
## Network plot
```{r, message=FALSE}
# data
Y <- ptsd
# estimate graph
fit <- ggm_inference(Y, boot = FALSE)
# get info for plotting
plot(fit, edge_magnify = 5)
```
## Bug Reports, Feature Requests, and Contributing
Bug reports and feature requests can be made by opening an issue on [Github](https://github.com/donaldRwilliams/GGMnonreg/issues). To contribute towards
the development of **GGMnonreg**, you can start a branch with a pull request and we can
discuss the proposed changes there.
## References
Owner
- Name: Donald R. Williams
- Login: donaldRwilliams
- Kind: user
- Repositories: 10
- Profile: https://github.com/donaldRwilliams
JOSS Publication
GGMnonreg: Non-Regularized Gaussian Graphical Models in R
Published
November 11, 2021
Volume 6, Issue 67, Page 3308
Authors
Donald R. Williams
Department of Psychology, University of California, Davis, NWEA, Portland, USA
Department of Psychology, University of California, Davis, NWEA, Portland, USA
Tags
Graphical models partial correlations Mixed graphical model Ising modelGitHub Events
Total
- Issues event: 3
- Watch event: 1
Last Year
- Issues event: 3
- Watch event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Donald R. Williams | b****9@g****m | 137 |
| donaldRwilliams | y****u@e****m | 19 |
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 3
- Total pull requests: 7
- Average time to close issues: over 2 years
- Average time to close pull requests: 6 days
- Total issue authors: 3
- Total pull request authors: 3
- Average comments per issue: 0.33
- Average comments per pull request: 0.14
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- KarolineHuth (1)
- lauraeong (1)
- guhjy (1)
Pull Request Authors
- donaldRwilliams (5)
- AlexChristensen (2)
- SachaEpskamp (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
DESCRIPTION
cran
- R >= 4.0.0 depends
- GGMncv * imports
- GGally * imports
- MASS * imports
- Matrix * imports
- Rdpack * imports
- bestglm * imports
- corpcor * imports
- doParallel * imports
- foreach * imports
- ggplot2 * imports
- methods * imports
- network * imports
- parallel * imports
- poibin * imports
- psych * imports
- sna * imports
- stats * imports
- qgraph * suggests
