gips

gips - Gaussian model Invariant by Permutation Symmetry

https://github.com/przechoj/gips

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.3%) to scientific vocabulary

Keywords

covariance-estimation machine-learning normal-distribution r r-package

Last synced: 10 months ago · JSON representation

Repository

gips - Gaussian model Invariant by Permutation Symmetry

Basic Info

Host: GitHub
Owner: PrzeChoj
License: gpl-3.0
Language: R
Default Branch: main
Homepage: https://przechoj.github.io/gips/
Size: 26 MB

Statistics

Stars: 9
Watchers: 2
Forks: 2
Open Issues: 7
Releases: 0

Topics

covariance-estimation machine-learning normal-distribution r r-package

Created over 4 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk[["set"]](
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  cache = FALSE
)

old_options <- options(scipen = 999) # turn off scientific notation
```

# `gips` 


[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![CRAN status](https://www.r-pkg.org/badges/version/gips)](https://CRAN.R-project.org/package=gips)
[![R-CMD-check](https://github.com/PrzeChoj/gips/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/PrzeChoj/gips/actions/workflows/R-CMD-check.yaml)
[![test-coverage](https://codecov.io/gh/PrzeChoj/gips/branch/main/graph/badge.svg)](https://app.codecov.io/gh/PrzeChoj/gips?branch=main)
![RStudio CRAN mirror downloads](http://cranlogs.r-pkg.org/badges/last-month/gips)



gips - Gaussian model Invariant by Permutation Symmetry

`gips` is an R package that looks for permutation symmetries in the multivariate Gaussian sample. Such symmetries reduce the free parameters in the unknown covariance matrix. This is especially useful when the number of variables is substantially larger than the number of observations.


## `gips` will help you with two things:
1. Finding hidden symmetries between the variables. `gips` can be used as an exploratory tool for searching the space of permutation symmetries of the Gaussian vector.  Useful in the Exploratory Data Analysis (EDA).
2. Covariance estimation. The Maximum Likelihood Estimator (MLE) for the covariance matrix is known to exist if and only if the number of variables is less or equal to the number of observations. Additional knowledge of symmetries significantly weakens this requirement. Moreover, the reduction of model dimension brings the advantage in terms of precision of covariance estimation. 

## Installation

From [CRAN](https://CRAN.R-project.org/package=gips):
``` r
# Install the released version from CRAN:
install.packages("gips")
```

From [GitHub](https://github.com/PrzeChoj/gips):
``` r
# Install the development version from GitHub:
# install.packages("remotes")
remotes::install_github("PrzeChoj/gips")
```

## Examples

### Example 1 - EDA

Assume we have the data, and we want to understand its structure:

```{r example_mean_unknown0}
library(gips)

Z <- HSAUR2::aspirin

# Renumber the columns for better readability:
Z[, c(2, 3)] <- Z[, c(3, 2)]
```

```{r example_mean_unknown1, include=FALSE}
names(Z) <- NULL
plot_cosmetic_modifications <- function(gg_plot_object) {
  my_col_names <- c(
    "deaths after placebo", "deaths after Aspirin",
    "treated with placebo", "treated with Aspirin"
  )

  suppressMessages( # message from ggplot2
    out <- gg_plot_object +
      ggplot2::scale_x_continuous(
        labels = my_col_names,
        breaks = 1:4
      ) +
      ggplot2::scale_y_reverse(
        labels = my_col_names,
        breaks = 1:4
      ) +
      ggplot2::theme(
        title = ggplot2::element_text(face = "bold", size = 16),
        axis.text.y = ggplot2::element_text(face = "bold", size = 10),
        axis.text.x = ggplot2::element_text(face = "bold", size = 10, angle = 6)
      )
  )

  out +
    ggplot2::geom_text(
      ggplot2::aes(
        label = round(covariance, -3)
      ),
      fontface = "bold", size = 6
    ) +
    ggplot2::theme(legend.position = "none")
}
```

Assume the data `Z` is normally distributed.

```{r example_mean_unknown2}
dim(Z)
number_of_observations <- nrow(Z) # 7
p <- ncol(Z) # 4

S <- cov(Z)
round(S, -3)

g <- gips(S, number_of_observations)
plot_cosmetic_modifications(plot(g, type = "heatmap"))
```

Remember, we analyze the covariance matrix. We can see some nearly identical colors in the estimated covariance matrix. E.g., variances of columns 1 and 2 are very similar (`S[1,1]` $\approx$ `S[2,2]`), and variances of columns 3 and 4 are very similar (`S[3,3]` $\approx$ `S[4,4]`). What is more, Covariances are also similar (`S[1,3]` $\approx$ `S[1,4]` $\approx$ `S[2,3]` $\approx$ `S[2,4]`). Are those approximate equalities coincidental? Or do they reflect some underlying data properties? It is hard to decide purely by looking at the matrix.

`find_MAP()` will use the Bayesian model to quantify if the approximate equalities are coincidental. Let's see if it will find this relationship:

```{r example_mean_unknown3}
g_MAP <- find_MAP(g, optimizer = "brute_force")

g_MAP
```

The `find_MAP` found the relationship (1,2)(3,4). The variances [1,1] and [2,2] as well as [3,3] and [4,4] are so close to each other that it is reasonable to consider them equal. Similarly, the covariances [1,3] and [2,4]; just as covariances [2,3] and [1,4], also will be considered equal:

```{r example_mean_unknown4}
S_projected <- project_matrix(S, g_MAP)
round(S_projected)

plot_cosmetic_modifications(plot(g_MAP, type = "heatmap"))
```

This `S_projected` matrix can now be interpreted as a more stable covariance matrix estimator.

We can also interpret the output as suggesting that there is no change in covariance for treatment with Aspirin.

### Example 2 - modeling

First, construct data for the example:
```{r example_mean_known1}
# Prepare model, multivariate normal distribution
p <- 6
n <- 4
mu <- numeric(p)
sigma_matrix <- matrix(
  data = c(
    1.1, 0.8, 0.6, 0.4, 0.6, 0.8,
    0.8, 1.1, 0.8, 0.6, 0.4, 0.6,
    0.6, 0.8, 1.1, 0.8, 0.6, 0.4,
    0.4, 0.6, 0.8, 1.1, 0.8, 0.6,
    0.6, 0.4, 0.6, 0.8, 1.1, 0.8,
    0.8, 0.6, 0.4, 0.6, 0.8, 1.1
  ),
  nrow = p, byrow = TRUE
) # sigma_matrix is a matrix invariant under permutation (1,2,3,4,5,6)


# Generate example data from a model:
Z <- withr::with_seed(2022,
  code = MASS::mvrnorm(n, mu = mu, Sigma = sigma_matrix)
)
# End of prepare model
```

```{r example_mean_known1_1, echo=FALSE}
plot(gips(sigma_matrix, 1), type = "heatmap") +
  ggplot2::labs(title = "This is the real covariance matrix\nWe want to estimate it based on n = 4 observations", x = "", y = "")
```

Suppose we do not know the actual covariance matrix $\Sigma$ and we want to estimate it. We cannot use the standard MLE because it does not exists ($4 < 6$, $n < p$).

We will assume it was generated from the normal distribution with the mean $0$.
```{r example_mean_known2}
library(gips)
dim(Z)
number_of_observations <- nrow(Z) # 4
p <- ncol(Z) # 6

# Calculate the covariance matrix from the data:
S <- (t(Z) %*% Z) / number_of_observations
```

Make the gips object out of data:
```{r example_mean_known3}
g <- gips(S, number_of_observations, was_mean_estimated = FALSE)
```

We can see the standard estimator of the covariance matrix:
$$\hat{\Sigma} = (1/n) \cdot  \Sigma_{i=1}^n \Big( Z^{(i)}\cdot\big({Z^{(i)}}^\top\big) \Big)$$
It is not MLE (again, because MLE does not exist for $n < p$):
```{r example_mean_known3_1}
plot(g, type = "heatmap") + ggplot2::ggtitle("Covariance estimated in standard way")
```

Let's find the Maximum A Posteriori Estimator for the permutation. Space is small ($6! = 720$), so it is reasonable to browse the whole of it:
```{r example_mean_known4}
g_map <- find_MAP(g, optimizer = "brute_force")

g_map
```

We see that the found permutation is over a hundred times more likely than making no additional assumption. That means the additional assumptions are justified.
```{r example_mean_known5}
summary(g_map)$n0
summary(g_map)$n0 <= number_of_observations # 1 <= 4
```

What is more, we see the number of observations ($4$) is bigger or equal to $n_0 = 1$, so we can estimate the covariance matrix with the Maximum Likelihood estimator:
```{r example_mean_known6}
S_projected <- project_matrix(S, g_map)
S_projected

# Plot the found matrix:
plot(g_map, type = "heatmap") + ggplot2::ggtitle("Covariance estimated with `gips`")
```

We see `gips` found the data's structure, and we could estimate the covariance matrix with huge accuracy only with a small amount of data and additional reasonable assumptions.

Note that the rank of the `S` matrix was 4, while the rank of the `S_projected` matrix was 6 (full rank).


# Further reading

For more examples and introduction, see `vignette("gips", package="gips")` or its [pkgdown page](https://przechoj.github.io/gips/articles/gips.html).

For an in-depth analysis of the package performance, capabilities, and comparison with similar packages, see the article "Learning permutation symmetries with gips in R" by `gips` developers Adam Chojecki, Paweł Morgen, and Bartosz Kołodziejek, [Journal of Statistical Software]().


# Acknowledgment

Project was funded by Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) programme.

The development was carried out with the support of the Laboratory of Bioinformatics and Computational Genomics and the High Performance Computing Center of the Faculty of Mathematics and Information Science Warsaw University of Technology.


```{r options_back, include = FALSE}
options(old_options) # back to the original options
```

Owner

Name: PrzeChoj
Login: PrzeChoj
Kind: user

Repositories: 30
Profile: https://github.com/PrzeChoj

GitHub Events

Total

Issues event: 1
Watch event: 2
Push event: 4
Create event: 1

Last Year

Issues event: 1
Watch event: 2
Push event: 4
Create event: 1

Committers

Last synced: about 1 year ago

All Time

Total Commits: 533
Total Committers: 6
Avg Commits per committer: 88.833
Development Distribution Score (DDS): 0.21

Past Year

Commits: 19
Committers: 3
Avg Commits per committer: 6.333
Development Distribution Score (DDS): 0.158

Top Committers

Name	Email	Commits
PrzeChoj	p**j@g**m	421
MrDomani	s**n@p**m	80
Pawel Morgen	p**n@p**m	21
kolodziejek	1****k	8
ZetrextJG	j**i@g**m	2
chojecki przemyslaw	c**p@p**I	1

Committer Domains (Top 20 + Academic)

p31119.mini: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 20
Total pull requests: 77
Average time to close issues: 2 months
Average time to close pull requests: 2 days
Total issue authors: 2
Total pull request authors: 4
Average comments per issue: 2.1
Average comments per pull request: 0.38
Merged pull requests: 76
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 6
Average time to close issues: N/A
Average time to close pull requests: about 5 hours
Issue authors: 1
Pull request authors: 3
Average comments per issue: 0.0
Average comments per pull request: 0.67
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

PrzeChoj (17)
MrDomani (3)

Pull Request Authors

PrzeChoj (72)
MrDomani (7)
kolodziejek (2)
ZetrextJG (2)

Top Labels

Issue Labels

enhancement (9) bug (4) good first issue (3) help wanted (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 707 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

cran.r-project.org: gips

Gaussian Model Invariant by Permutation Symmetry

Homepage: https://github.com/PrzeChoj/gips
Documentation: http://cran.r-project.org/web/packages/gips/gips.pdf
License: GPL (≥ 3)
Latest release: 1.2.3
published over 1 year ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 707 Last month

Rankings

Forks count: 21.9%

Stargazers count: 24.2%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Average: 37.2%

Downloads: 74.6%

Maintainers (1)

adam.prze.choj@gmail.com

Last synced: 11 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions

actions/checkout v2 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/pkgdown.yaml actions

JamesIves/github-pages-deploy-action 4.1.4 composite
actions/checkout v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/test-coverage.yaml actions

actions/checkout v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

DESCRIPTION cran

R >= 3.5.0 depends
numbers * imports
permutations * imports
rlang * imports
utils * imports
DAAG * suggests
HSAUR * suggests
MASS >= 7.3 suggests
dplyr * suggests
ggplot2 * suggests
graphics * suggests
knitr * suggests
rmarkdown * suggests
spelling * suggests
stringi * suggests
testthat >= 3.0.0 suggests
tibble * suggests
tidyr * suggests
withr * suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

gips

Science Score: 39.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: gips

Rankings

Maintainers (1)

Dependencies