gspcr

Generalized Supervised Principal Component Regression

https://github.com/edoardocostantini/gspcr

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Generalized Supervised Principal Component Regression

Basic Info
  • Host: GitHub
  • Owner: EdoardoCostantini
  • License: other
  • Language: R
  • Default Branch: master
  • Size: 26.9 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 2
  • Releases: 4
Created over 3 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

Generalized Supervised Principal Component regression

An R package implementing a version of the Supervised Principal Component regression (SPCR, Bair et al., 2006) that allows for any measurement level of the dependent and independent variables. This package builds upon the method implemented in the R package superpc.

Details

SPCR regresses a dependent variable onto a few supervised principal components computed from a large set of predictors. The steps followed by SPCR are the following:

  1. Regress the dependent variable onto each column of a data set of p possible predictors via p simple linear regressions.
  2. Define a subset of the original p variables by discarding all variables whose univariate regression coefficient is less than a chosen threshold.
  3. Use the subset of original data to estimate q PCs.
  4. Regress the dependent variable onto the q PCs.

A key aspect of the method is that both the number of PCs and the threshold value can be determined by cross-validation. GSPCR extends SPCR by allowing the dependent variable to be of any measurement level (i.e., ratio, interval, ordinal, nominal) by introducing likelihood-based thresholds for the univariate regressions in step 1. Furthermore, GSPCR allows the predictors to be of any type by combining the PCAmix framework (Kiers, 1991; Chavent et al., 2017) with SPCR in step 3.

Features

The R package gspcr allows to:

  • Estimate the GSPCR model on a training data set;
  • Plot the cross-validation trends used to tune the threshold value and the number of PCs to compute;
  • Predict observations on both the training data and new, previously unseen, data

Installation

To install the latest version, run the following command in your R console:

devtools::install_github("EdoardoCostantini/gspcr")

Usage

To load the gspcr library in an R session and start using it:

library("gspcr")

To check an example of how to use the package, call the help file for the main function:

help("cv_gspcr")

You can also read the first draft of the package vignette by opening the file ./vignettes/main-features.html.

Check on how to cite the package with the R command:

citation("gspcr")

Development

Tests

This software uses unit tests to test whether the functions behave as expected. These tests are performed by using the test() function from the devtools R package. The workflow is the one established by the testthat R package. To run these tests:

  1. Install the devtools R package, if you do not have it already.

    r install.packages("devtools")

  2. Then, you can test with the following command in the R console:

    r devtools::check()

Vignettes

The vignettes for this package take a lot of time to compile. As a result, I adopted a two-step workflow:

  1. Modify the desired vignette in the .Rmd.orig format
  2. Use the rebuild-vignettes.R script to update the actual .Rmd vignette files

When testing, building, uploading the package, only the .Rmd versions of the vignettes will be compiled. These versions contain precompiled R code and plots so that the compilation time is minimized. This workflow is inspired by:

References

Bair E, Hastie T, Paul D, Tibshirani R (2006). “Prediction by supervised principal components.” J. Am. Stat. Assoc., 101(473), 119-137.

Chavent, M., Kuentz-Simonet, V., Labenne, A., & Saracco, J. (2014). Multivariate analysis of mixed data: The R package PCAmixdata. arXiv preprint arXiv:1411.4911.

Kiers, H. A. (1991). Simple structure in component analysis techniques for mixtures of qualitative and quantitative variables. Psychometrika, 56(2), 197-212.

Owner

  • Name: Edo
  • Login: EdoardoCostantini
  • Kind: user
  • Location: Tilburg, Netherlands
  • Company: Tilburg University

Sociologist turned statistician, missed developer, born interior designer but never got there

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Costantini
    given-names: Edoardo
    orcid: https://orcid.org/0000-0001-9581-9913
title: "gspcr: Generalized Supervised Principal Component Regression"
version: 0.9.4.1
date-released: 2023-09-06
repository-code: "https://github.com/EdoardoCostantini/gspcr"

GitHub Events

Total
Last Year

Dependencies

DESCRIPTION cran
  • R >= 2.10 depends
  • FactoMineR * imports
  • MASS * imports
  • MLmetrics * imports
  • PCAmixdata * imports
  • dplyr * imports
  • ggplot2 * imports
  • nnet * imports
  • reshape2 * imports
  • rlang * imports
  • klippy * suggests
  • knitr * suggests
  • lmtest * suggests
  • patchwork * suggests
  • rmarkdown * suggests
  • superpc * suggests
  • testthat >= 3.0.0 suggests