Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
3 of 14 committers (21.4%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: rgcca-factory
- Language: R
- Default Branch: main
- Homepage: https://rgcca-factory.github.io/RGCCA/
- Size: 30.2 MB
Statistics
- Stars: 12
- Watchers: 5
- Forks: 12
- Open Issues: 15
- Releases: 0
Metadata Files
README.md
R/SGCCA
Version: 3.0.2
Authors:
Fabien GIRKA, Etienne CAMENEN, Caroline PELTIER, Vincent GUILLEMOT, Arnaud GLOAGUEN, Laurent LE BRUSQUET, Arthur TENENHAUS
Key-words:
Regularized Generalized Canonical Correlation Analysis, multi-block data analysis
Contact:
arthur.tenenhaus@centralesupelec.fr
Short description
Performs multiblock component methods (PCA, CCA, PLS, MCOA, GCCA, CPCA, MAXVAR, R/SGCCA, etc.) and produces graphical outputs (e.g. variables and individuals plots) and statistics to assess the robustness/significance of the analysis.
Contents
- Description
- Algorithm
- Installation
- Installation of a development branch from the git repository
- References
Descriptiont
A package for multiblock data analysis (RGCCA - Regularized Generalized Canonical Correlation Analysis) as described in [1-4]. The software produces graphical outputs and statistics to assess the robustness/significance of the analysis.
Algorithm
We consider $J$ data matrices $\mathbf X1 , \dots, \mathbf XJ$. Each $n \times pj$ data matrix $\mathbf Xj = \left[ x{j1}, \dots, x{jpj} \right]$ is called a block and represents a set of $pj$ variables observed on $n$ individuals. The number and the nature of the variables may differ from one block to another, but the individuals must be the same across blocks. We assume that all variables are centered. The objective of RGCCA is to find, for each block, a weighted composite of variables (called block component) $\mathbf yj = \mathbf Xj \mathbf aj, ~ j = 1 ,..., J$ (where $\mathbf aj$ is a column-vector with $p_j$ elements) summarizing the relevant information between and within the blocks. The block components are obtained such that (i) block components explain well their own block and/or (ii) block components that are assumed to be connected are highly correlated. In addition, RGCCA integrates a variable selection procedure, called SGCCA, allowing the identification of the most relevant features.
RGCCA subsumes fifty years of multiblock component methods and is defined as the following optimization problem: $$\underset{\mathbf a1, \dots, \mathbf aJ}{\text{maximize}} \sum{j, k = 1}^J c{jk} g(\text{cov}(\mathbf Xj \mathbf aj, \mathbf Xk \mathbf ak)) \text{ s.t. } (1 - \tauj)\text{var}(\mathbf Xj \mathbf aj) + \tauj \Vert \mathbf a_j \Vert^2 = 1, ~ j = 1, \dots, J.$$
The scheme function $g$ is any continuous convex function and allows to consider different optimization criteria. Typical choices of $g$ are the identity (horst scheme, leading to maximizing the sum of covariances between block components), the absolute value (centroid scheme, yielding maximization of the sum of the absolute values of the covariances), the square function (factorial scheme, thereby maximizing the sum of squared covariances), or, more generally, for any even integer $m$, $g(x) = x^m$ ($m$-scheme, maximizing the power of $m$ of the sum of covariances). The horst scheme penalizes structural negative correlation between block components while both the centroid scheme and the $m$-scheme enable two components to be negatively correlated. According to [5], a fair model is a model where all blocks contribute equally to the solution in opposition to a model dominated by only a few of the $J$ sets. If fairness is a major objective, the user must choose $m = 1$. $m > 1$ is preferable if the user wants to discriminate between blocks. In practice, $m$ is equal to 1, 2 or 4. The higher the value of $m$ the more the method acts as block selector [5].
The design matrix $\mathbf C$ is a symmetric $J \times J$ matrix of nonnegative elements describing the network of connections between blocks the user wants to take into account. Usually, $c_{jk} = 1$ for two connected blocks and 0 otherwise.
The $\tauj$ are called shrinkage parameters or regularization parameters ranging from 0 to 1. $\tauj$ enables interpolate smoothly between maximizing the covariance and maximizing the correlation. Setting the $\tauj$ to 0 will force the block components to unit variance ($\text{var}(\mathbf Xj \mathbf aj) = 1$). In this case, the covariance criterion boils down to the correlation. The correlation criterion is better in explaining the correlated structure across datasets, thus discarding the variance within each individual dataset. Setting $\tauj$ to 1 will normalize the block weight vectors ($\Vert \mathbf aj \Vert = 1$), which applies the covariance criterion. A value between 0 and 1 will lead to a compromise between the two first options and correspond to the following constraint $(1 − \tauj) \text{var}(\mathbf Xj \mathbf aj) + \tauj \Vert \mathbf aj \Vert^2 = 1$. In the RGCCA package, for each block, the determination of the shrinkage parameter can be made fully automatic by using the analytical formula proposed by (Schäfer and Strimmer 2005 [6]), by permutation or K fold cross-validation. Moreover, we can define the choice of the shrinkage parameters by providing interpretations on the properties of the resulting block components:
- $\tau_j = 1$ yields the maximization of a covariance-based criterion. It is recommended when the user wants a stable component (large variance) while simultaneously taking into account the correlations between blocks. The user must, however, be aware that variance dominates over correlation.
- $\tauj = 0$ yields the maximization of a correlation-based criterion. It is recommended when the user wants to maximize correlations between connected components. This option can yield unstable solutions in case of multi-collinearity and cannot be used when a data block is rank deficient (e.g. $n < pj$).
- $0 < \tau_j < 1$ is a good compromise between variance and correlation: the block components are simultaneously stable and as well correlated as possible with their connected block components. This setting can be used when the data block is rank deficient.
The quality and interpretability of the RGCCA block components $\mathbf yj = \mathbf Xj \mathbf aj, ~ j = 1 , \dots, J$ are likely affected by the usefulness and relevance of the variables of each block. Accordingly, it is an important issue to identify within each block a subset of significant variables which are active in the relationships between blocks. SGCCA extends RGCCA to address this issue of variable selection. Specifically, RGCCA with all $\tauj$ equal to 1 is combined with an L1-penalty that gives rise to SGCCA [3]. The SGCCA optimization problem is defined with $sj$, a user defined positive constant that determines the amount of sparsity through the additional constraint $\Vert \mathbf aj \Vert1 \leq sj, ~ j = 1, \dots, J$. The smaller the $sj$, the larger the degree of sparsity for $\mathbf aj$. The sparsity parameter $sj$ is usually set by cross-validation or permutation. Alternatively, values of $sj$ can simply be chosen to result in desired amounts of sparsity.
Installation
Required:
Software: R (≥ 3.2.0)
R libraries: see the DESCRIPTION file.
install.packages("RGCCA")
See the vignette for an introduction to the package.
Installation of a development branch from the git repository
Required:
Software: R (≥ 3.2.0)
R libraries: see the DESCRIPTION file.
The R library
devtools.
remove.packages("RGCCA")
devtools::install_github(repo="https://github.com/rgcca-factory/RGCCA.git", ref = "main")
References
- Tenenhaus, M., Tenenhaus, A., & Groenen, P. J. (2017). Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. Psychometrika, 82(3), 737-777.
- Tenenhaus, A., Philippe, C., & Frouin, V. (2015). Kernel generalized canonical correlation analysis. Computational Statistics & Data Analysis, 90, 114-131.
- Tenenhaus, A., Philippe, C., Guillemot, V., Le Cao, K. A., Grill, J., & Frouin, V. (2014). Variable selection for generalized canonical correlation analysis. Biostatistics, 15(3), 569-583.
- Tenenhaus, A., & Tenenhaus, M. (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76(2), 257.
- Van de Geer, J. P. (1984). Linear relations among K sets of variables. Psychometrika, 49(1), 79-94.
- Schäfer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology, 4(1).
- Tenenhaus, A., & Tenenhaus, M. (2014). Regularized generalized canonical correlation analysis for multiblock or multigroup data analysis. European Journal of operational research, 238(2), 391-403.
Owner
- Name: rgcca-factory
- Login: rgcca-factory
- Kind: organization
- Repositories: 4
- Profile: https://github.com/rgcca-factory
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"identifier": "RGCCA",
"description": "Multi-block data analysis concerns the analysis of several sets of variables (blocks) observed on the same group of individuals. The main aims of the RGCCA package are: to study the relationships between blocks and to identify subsets of variables of each block which are active in their relationships with the other blocks. This package allows to (i) run R/SGCCA and related methods, (ii) help the user to find out the optimal parameters for R/SGCCA such as regularization parameters (tau or sparsity), (iii) evaluate the stability of the RGCCA results and their significance, (iv) build predictive models from the R/SGCCA. (v) Generic print() and plot() functions apply to all these functionalities.",
"name": "RGCCA: Regularized and Sparse Generalized Canonical Correlation\n Analysis for Multiblock Data",
"relatedLink": [
"https://rgcca-factory.github.io/RGCCA/",
"https://CRAN.R-project.org/package=RGCCA"
],
"codeRepository": "https://github.com/rgcca-factory/RGCCA",
"issueTracker": "https://github.com/rgcca-factory/RGCCA/issues",
"license": "https://spdx.org/licenses/GPL-3.0",
"version": "3.0.3",
"programmingLanguage": {
"@type": "ComputerLanguage",
"name": "R",
"url": "https://r-project.org"
},
"runtimePlatform": "R version 4.3.0 (2023-04-21)",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"author": [
{
"@type": "Person",
"givenName": "Fabien",
"familyName": "Girka"
},
{
"@type": "Person",
"givenName": "Etienne",
"familyName": "Camenen"
},
{
"@type": "Person",
"givenName": "Caroline",
"familyName": "Peltier"
},
{
"@type": "Person",
"givenName": "Arnaud",
"familyName": "Gloaguen"
},
{
"@type": "Person",
"givenName": "Vincent",
"familyName": "Guillemot"
},
{
"@type": "Person",
"givenName": "Arthur",
"familyName": "Tenenhaus",
"email": "arthur.tenenhaus@centralesupelec.fr"
}
],
"contributor": [
{
"@type": "Person",
"givenName": "Laurent",
"familyName": "Le Brusquet"
},
{
"@type": "Person",
"givenName": "Arthur",
"familyName": "Tenenhaus",
"email": "arthur.tenenhaus@centralesupelec.fr"
}
],
"maintainer": [
{
"@type": "Person",
"givenName": "Arthur",
"familyName": "Tenenhaus",
"email": "arthur.tenenhaus@centralesupelec.fr"
}
],
"softwareSuggestions": [
{
"@type": "SoftwareApplication",
"identifier": "devtools",
"name": "devtools",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=devtools"
},
{
"@type": "SoftwareApplication",
"identifier": "FactoMineR",
"name": "FactoMineR",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=FactoMineR"
},
{
"@type": "SoftwareApplication",
"identifier": "knitr",
"name": "knitr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=knitr"
},
{
"@type": "SoftwareApplication",
"identifier": "pander",
"name": "pander",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=pander"
},
{
"@type": "SoftwareApplication",
"identifier": "rmarkdown",
"name": "rmarkdown",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rmarkdown"
},
{
"@type": "SoftwareApplication",
"identifier": "rticles",
"name": "rticles",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rticles"
},
{
"@type": "SoftwareApplication",
"identifier": "testthat",
"name": "testthat",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=testthat"
},
{
"@type": "SoftwareApplication",
"identifier": "vdiffr",
"name": "vdiffr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=vdiffr"
}
],
"softwareRequirements": {
"1": {
"@type": "SoftwareApplication",
"identifier": "R",
"name": "R",
"version": ">= 3.5"
},
"2": {
"@type": "SoftwareApplication",
"identifier": "caret",
"name": "caret",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=caret"
},
"3": {
"@type": "SoftwareApplication",
"identifier": "Deriv",
"name": "Deriv",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=Deriv"
},
"4": {
"@type": "SoftwareApplication",
"identifier": "ggplot2",
"name": "ggplot2",
"version": ">= 3.4.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=ggplot2"
},
"5": {
"@type": "SoftwareApplication",
"identifier": "ggrepel",
"name": "ggrepel",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=ggrepel"
},
"6": {
"@type": "SoftwareApplication",
"identifier": "graphics",
"name": "graphics"
},
"7": {
"@type": "SoftwareApplication",
"identifier": "gridExtra",
"name": "gridExtra",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=gridExtra"
},
"8": {
"@type": "SoftwareApplication",
"identifier": "MASS",
"name": "MASS",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=MASS"
},
"9": {
"@type": "SoftwareApplication",
"identifier": "matrixStats",
"name": "matrixStats",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=matrixStats"
},
"10": {
"@type": "SoftwareApplication",
"identifier": "methods",
"name": "methods"
},
"11": {
"@type": "SoftwareApplication",
"identifier": "parallel",
"name": "parallel"
},
"12": {
"@type": "SoftwareApplication",
"identifier": "pbapply",
"name": "pbapply",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=pbapply"
},
"13": {
"@type": "SoftwareApplication",
"identifier": "rlang",
"name": "rlang",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Comprehensive R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"sameAs": "https://CRAN.R-project.org/package=rlang"
},
"14": {
"@type": "SoftwareApplication",
"identifier": "stats",
"name": "stats"
},
"SystemRequirements": null
},
"fileSize": "1237.747KB",
"citation": [
{
"@type": "SoftwareSourceCode",
"author": [
{
"@type": "Person",
"givenName": "Fabien",
"familyName": "Girka"
},
{
"@type": "Person",
"givenName": "Etienne",
"familyName": "Camenen"
},
{
"@type": "Person",
"givenName": "Caroline",
"familyName": "Peltier"
},
{
"@type": "Person",
"givenName": "Arnaud",
"familyName": "Gloaguen"
},
{
"@type": "Person",
"givenName": "Vincent",
"familyName": "Guillemot"
},
{
"@type": "Person",
"givenName": "Laurent",
"familyName": "Le Brusquet"
},
{
"@type": "Person",
"givenName": "Arthur",
"familyName": "Tenenhaus"
}
],
"name": "{RGCCA}: Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data",
"url": "https://CRAN.R-project.org/package=RGCCA",
"description": "R package version 3.0.3"
}
],
"releaseNotes": "https://github.com/rgcca-factory/RGCCA/blob/main/NEWS.md",
"readme": "https://github.com/rgcca-factory/RGCCA/blob/main/README.md",
"developmentStatus": "https://lifecycle.r-lib.org/articles/stages.html#stable"
}
GitHub Events
Total
- Watch event: 2
- Issue comment event: 15
- Push event: 15
- Pull request event: 2
- Fork event: 1
- Create event: 4
Last Year
- Watch event: 2
- Issue comment event: 15
- Push event: 15
- Pull request event: 2
- Fork event: 1
- Create event: 4
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| GFabien | f****a@s****r | 316 |
| Etienne Camenen | e****n@g****m | 223 |
| cpeltier | c****r@i****g | 217 |
| Tenenhaus | a****s@c****r | 99 |
| Arnaud | a****n@s****r | 80 |
| Fabien | f****a@g****m | 44 |
| Caroline Peltier | c****r@g****m | 17 |
| llrs | l****a@g****m | 12 |
| ChemoSens | c****r@i****r | 4 |
| Arnaud | a****n@g****m | 2 |
| arthur | a****r@t****r | 2 |
| abourrelier | a****r@i****g | 1 |
| Tenenhaus | a****s@s****r | 1 |
| GFabien | 3****n | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 13
- Total pull requests: 76
- Average time to close issues: 5 months
- Average time to close pull requests: 20 days
- Total issue authors: 9
- Total pull request authors: 7
- Average comments per issue: 6.46
- Average comments per pull request: 0.33
- Merged pull requests: 59
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 4.0
- Average comments per pull request: 0.33
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- llrs (4)
- tonyliang19 (2)
- PFRoux (1)
- WUZHExl (1)
- Ombel88 (1)
- GFabien (1)
- fataltes (1)
- JChRoy (1)
- EGoujon (1)
Pull Request Authors
- GFabien (55)
- AGloaguen (13)
- Tenenhaus (5)
- llrs (2)
- bernt-matthias (1)
- vguillemot (1)
- aljabadi (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- cran 482 last-month
- Total docker downloads: 18
-
Total dependent packages: 4
(may contain duplicates) -
Total dependent repositories: 9
(may contain duplicates) - Total versions: 10
- Total maintainers: 1
cran.r-project.org: RGCCA
Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data
- Homepage: https://github.com/rgcca-factory/RGCCA
- Documentation: http://cran.r-project.org/web/packages/RGCCA/RGCCA.pdf
- License: GPL-3
-
Latest release: 3.0.3
published over 2 years ago
Rankings
Maintainers (1)
conda-forge.org: r-rgcca
- Homepage: https://rgcca-factory.github.io/RGCCA/
- License: GPL-2.0-or-later
-
Latest release: 2.1.2
published over 7 years ago
Rankings
Dependencies
- MASS * depends
- R >= 3.2 depends
- Deriv * imports
- ggplot2 * imports
- grDevices * imports
- graphics * imports
- gridExtra * imports
- methods * imports
- parallel * imports
- pbapply * imports
- plotly * imports
- scales * imports
- stats * imports
- utils * imports
- DT * suggests
- bsplus * suggests
- devtools * suggests
- ggrepel * suggests
- igraph * suggests
- knitr * suggests
- magrittr * suggests
- markdown * suggests
- nnet * suggests
- openxlsx * suggests
- optparse * suggests
- pander * suggests
- rmarkdown * suggests
- shiny * suggests
- shinyjs * suggests
- testthat * suggests
- visNetwork * suggests
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- r-lib/actions/setup-tinytex v2 composite
- JamesIves/github-pages-deploy-action v4.4.1 composite
- actions/checkout v3 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- r-lib/actions/setup-tinytex v2 composite
- actions/checkout v3 composite
- r-lib/actions/pr-fetch v2 composite
- r-lib/actions/pr-push v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- actions/checkout v3 composite
- actions/upload-artifact v3 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite