sharp

R package sharp (Stability-enHanced Approaches using Resampling Procedures).

https://github.com/barbarabodinier/sharp

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.0%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

R package sharp (Stability-enHanced Approaches using Resampling Procedures).

Basic Info
  • Host: GitHub
  • Owner: barbarabodinier
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 17.9 MB
Statistics
  • Stars: 14
  • Watchers: 3
  • Forks: 1
  • Open Issues: 6
  • Releases: 14
Created about 5 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# sharp: Stability-enHanced Approaches using Resampling Procedures 


[![CRAN status](https://www.r-pkg.org/badges/version/sharp)](https://CRAN.R-project.org/package=sharp)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/sharp?color=blue)](https://r-pkg.org/pkg/sharp)
![GitHub last commit](https://img.shields.io/github/last-commit/barbarabodinier/sharp?logo=GitHub&style=flat-square)
[![R-CMD-check](https://github.com/barbarabodinier/sharp/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/barbarabodinier/sharp/actions/workflows/R-CMD-check.yaml)


## Description

> In stability selection and consensus clustering, resampling techniques are used to enhance the reliability of the results. In this package, hyper-parameters are calibrated by maximising model stability, which is measured under the null hypothesis that all selection (or co-membership) probabilities are identical. Functions are readily implemented for the use of LASSO regression, sparse PCA, sparse (group) PLS or graphical LASSO in stability selection, and hierarchical clustering, partitioning around medoids, K means or Gaussian mixture models in consensus clustering.

## Installation

The released version of the package can be installed from [CRAN](https://CRAN.R-project.org) with:

```{r installation cran, eval=FALSE}
install.packages("sharp")
```

The development version can be installed from [GitHub](https://github.com/):

```{r installation github, eval=FALSE}
remotes::install_github("barbarabodinier/sharp")
```

## Example datasets

To illustrate the use of the main functions implemented in [**sharp**](https://CRAN.R-project.org/package=sharp), three artificial datasets are created:

```{r simulation, eval=FALSE}
library(sharp)

# Dataset for regression
set.seed(1)
data_reg <- SimulateRegression(n = 200, pk = 10)
x_reg <- data_reg$xdata
y_reg <- data_reg$ydata

# Dataset for structural equation modelling
set.seed(1)
data_sem <- SimulateStructural(n = 200, pk = c(5, 2, 3))
x_sem <- data_sem$data

# Dataset for graphical modelling
set.seed(1)
data_ggm <- SimulateGraphical(n = 200, pk = 20)
x_ggm <- data_ggm$data

# Dataset for clustering
set.seed(1)
data_clust <- SimulateClustering(n = c(10, 10, 10))
x_clust <- data_clust$data
```

Check out the R package [**fake**](https://github.com/barbarabodinier/fake) for more details on these data simulation models.

## Main functions

### Variable selection

In a regression context, stability selection is done using LASSO regression as implemented in the R package [**glmnet**](https://CRAN.R-project.org/package=glmnet). 

```{r variable selection, eval=FALSE}
stab_reg <- VariableSelection(xdata = x_reg, ydata = y_reg)
SelectedVariables(stab_reg)
```

### Structural equation modelling

In a structural equation modelling context, stability selection is done using series of LASSO regressions as implemented in the R package [**glmnet**](https://CRAN.R-project.org/package=glmnet). 

```{r structural equation modelling, eval=FALSE}
dag <- LayeredDAG(layers = c(5, 2, 3))
stab_sem <- StructuralEquations(xdata = x_sem, adjacency = dag)
LinearSystemMatrix(vect = Stable(stab_sem), adjacency = dag)
```

### Graphical modelling

In a graphical modelling context, stability selection is done using the graphical LASSO as implemented in the R package [**glassoFast**](https://CRAN.R-project.org/package=glassoFast). 

```{r graphical modelling, eval=FALSE}
stab_ggm <- GraphicalModel(xdata = x_ggm)
Adjacency(stab_ggm)
```

### Clustering

Consensus clustering is done using hierarchical clustering as implemented in the R package [**stats**](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html). 

```{r clustering, eval=FALSE}
stab_clust <- Clustering(xdata = x_clust)
Clusters(stab_clust)
```

## Extraction and visualisation of the results

It is strongly recommended to check the calibration of the hyper-parameters using the function `CalibrationPlot()` on the output from any of the main functions listed above. The functions `print()`, `summary()` and `plot()` can also be used on the outputs from the main functions.

## Parametrisation

Stability selection and consensus clustering can theoretically be done by aggregating the results from any selection (or clustering) algorithm on subsamples of the data. The choice of the underlying algorithm to use is specified in argument `implementation` in the main functions. Consensus clustering using partitioning around medoids, K means or Gaussian mixture models are also supported in [**sharp**](https://CRAN.R-project.org/package=sharp):

```{r implementation, eval=FALSE}
stab_clust <- Clustering(xdata = x_clust, implementation = PAMClustering)
stab_clust <- Clustering(xdata = x_clust, implementation = KMeansClustering)
stab_clust <- Clustering(xdata = x_clust, implementation = GMMClustering)
```

Other algorithms can be used by defining a wrapper function to be called in `implementation`. Check out the documentation of `GraphicalModel()` for an example using a shrunk estimate of the partial correlation instead of the graphical LASSO. 

## References

- Barbara Bodinier, Sabrina Rodrigues, Maryam Karimi, Sarah Filippi, Julien Chiquet and Marc Chadeau-Hyam. Stability Selection and Consensus Clustering in R: The R Package sharp. (2025) Journal of Statistical Software. [link](https://doi.org/10.18637/jss.v112.i05)

- Barbara Bodinier, Dragana Vuckovic, Sabrina Rodrigues, Sarah Filippi, Julien Chiquet and Marc Chadeau-Hyam. Automated calibration of consensus weighted distance-based clustering approaches using sharp. (2023) Bioinformatics. [link](https://doi.org/10.1093/bioinformatics/btad635)

- Barbara Bodinier, Sarah Filippi, Therese Haugdahl Nost, Julien Chiquet and Marc Chadeau-Hyam. Automated calibration for stability selection in penalised regression and graphical models. (2021) Journal of the Royal Statistical Society: Series C (Applied Statistics). [link](https://doi.org/10.1093/jrsssc/qlad058)

- Nicolai Meinshausen and Peter Bühlmann. Stability selection. (2010) Journal of the Royal Statistical Society: Series B (Statistical Methodology). [link](https://doi.org/10.1111/j.1467-9868.2010.00740.x)

- Stefano Monti, Pablo Tamayo, Jill Mesirov and Todd Golub. Consensus clustering. (2003) Machine Learning. [link](https://doi.org/10.1023/A:1023949509487)

Owner

  • Name: Barbara Bodinier
  • Login: barbarabodinier
  • Kind: user
  • Company: Imperial College London

Postdoctoral researcher in Biostatistics

GitHub Events

Total
  • Create event: 1
  • Release event: 1
  • Issues event: 5
  • Push event: 7
Last Year
  • Create event: 1
  • Release event: 1
  • Issues event: 5
  • Push event: 7

Committers

Last synced: about 3 years ago

All Time
  • Total Commits: 363
  • Total Committers: 2
  • Avg Commits per committer: 181.5
  • Development Distribution Score (DDS): 0.052
Top Committers
Name Email Commits
Barbara Bodinier b****r@i****k 344
Barbara Bodinier b****a@b****e 19
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 8
  • Total pull requests: 3
  • Average time to close issues: 3 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 7
  • Total pull request authors: 3
  • Average comments per issue: 0.75
  • Average comments per pull request: 0.33
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • moosterwegel (2)
  • adawnir (2)
  • llauriez (1)
  • dcstang (1)
  • KheziaNtomo (1)
  • ruweng (1)
  • andreyurch (1)
  • fguntoro (1)
Pull Request Authors
  • barbarabodinier (2)
  • jchiquet (1)
Top Labels
Issue Labels
enhancement (3) question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 356 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 13
  • Total maintainers: 1
cran.r-project.org: sharp

Stability-enHanced Approaches using Resampling Procedures

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 356 Last month
Rankings
Stargazers count: 19.8%
Forks count: 28.8%
Dependent packages count: 29.8%
Average: 31.2%
Dependent repos count: 35.5%
Downloads: 42.0%
Maintainers (1)
Last synced: 7 months ago

Dependencies

DESCRIPTION cran
  • fake * depends
  • MASS * imports
  • Rdpack * imports
  • glassoFast >= 1.0.0 imports
  • glmnet * imports
  • grDevices * imports
  • huge * imports
  • igraph * imports
  • mclust * imports
  • parallel * imports
  • withr >= 2.4.0 imports
  • RCy3 * suggests
  • cluster * suggests
  • corpcor * suggests
  • dbscan * suggests
  • elasticnet * suggests
  • gglasso * suggests
  • mixOmics * suggests
  • nnet * suggests
  • plotrix * suggests
  • rmarkdown * suggests
  • sgPLS * suggests
  • survival >= 3.2.13 suggests
  • testthat >= 3.0.0 suggests
  • visNetwork * suggests