ClusterR

Gaussian mixture models, k-means, mini-batch-kmeans and k-medoids clustering

https://github.com/mlampros/clusterr

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.1%) to scientific vocabulary

Keywords

affinity-propagation cpp11 gmm kmeans kmedoids-clustering mini-batch-kmeans r rcpparmadillo
Last synced: 6 months ago · JSON representation

Repository

Gaussian mixture models, k-means, mini-batch-kmeans and k-medoids clustering

Basic Info
Statistics
  • Stars: 85
  • Watchers: 5
  • Forks: 29
  • Open Issues: 2
  • Releases: 0
Topics
affinity-propagation cpp11 gmm kmeans kmedoids-clustering mini-batch-kmeans r rcpparmadillo
Created over 9 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog

README.md

tic codecov.io CRAN_Status_Badge Downloads Buy Me A Coffee Dependencies

ClusterR


The ClusterR package consists of Gaussian mixture models, k-means, mini-batch-kmeans, k-medoids and affinity propagation clustering algorithms with the option to plot, validate, predict (new data) and find the optimal number of clusters. The package takes advantage of 'RcppArmadillo' to speed up the computationally intensive parts of the functions. More details on the functionality of ClusterR can be found in the blog-posts (first and second), Vignette and in the package Documentation ( scroll down for information on how to use the docker image )

UPDATE 16-08-2018

As of version 1.1.4 the ClusterR package allows R package maintainers to perform linking between packages at a C++ code (Rcpp) level. This means that the Rcpp functions of the ClusterR package can be called in the C++ files of another package. In the next lines I'll give detailed explanations on how this can be done:


Assumming that an R package ('PackageA') calls one of the ClusterR Rcpp functions. Then the maintainer of 'PackageA' has to :


  • 1st. install the ClusterR package to take advantage of the new functionality either from CRAN using,


```R

install.packages("ClusterR")

```


or download the latest version from Github using the remotes package,


```R

remotes::install_github('mlampros/ClusterR', upgrade = 'always', dependencies = TRUE, repos = 'https://cloud.r-project.org/')

```


  • 2nd. update the DESCRIPTION file of 'PackageA' and especially the LinkingTo field by adding the ClusterR package (besides any other packages),


```R

LinkingTo: ClusterR

```


  • 3rd. open a new C++ file (for instance in Rstudio) and at the top of the file add the following 'headers', 'depends' and 'plugins',


```R

include

include

include

// [[Rcpp::depends("RcppArmadillo")]] // [[Rcpp::depends(ClusterR)]] // [[Rcpp::plugins(cpp11)]]

```

The available functions can be found in the following files: inst/include/ClusterRHeader.h and inst/include/affinity_propagation.h


A complete minimal example would be :


```R

include

include

include

// [[Rcpp::depends("RcppArmadillo")]] // [[Rcpp::depends(ClusterR)]] // [[Rcpp::plugins(cpp11)]]

using namespace clustR;

// [[Rcpp::export]] Rcpp::List minibatchkmeans(arma::mat& data, int clusters, int batchsize, int maxiters, int num_init = 1,

                        double init_fraction = 1.0, std::string initializer = "kmeans++",

                        int early_stop_iter = 10, bool verbose = false, 

                        Rcpp::Nullable<Rcpp::NumericMatrix> CENTROIDS = R_NilValue, 

                        double tol = 1e-4, double tol_optimal_init = 0.5, int seed = 1) {

ClustHeader clust_header;

return clustheader.minibatchkmeans(data, clusters, batchsize, maxiters, numinit, init_fraction,

                                    initializer, early_stop_iter, verbose, CENTROIDS, tol, 

                                    tol_optimal_init, seed);

}

```


Then, by opening an R file a user can call the minibatchkmeans function using,


```R

Rcpp::sourceCpp('example.cpp') # assuming that the previous Rcpp code is included in 'example.cpp'

set.seed(1) dat = matrix(runif(100000), nrow = 1000, ncol = 100)

mbkm = minibatchkmeans(dat, clusters = 3, batchsize = 50, maxiters = 100, num_init = 2,

                     init_fraction = 1.0, initializer = "kmeans++", early_stop_iter = 10, 

                     verbose = T, CENTROIDS = NULL, tol = 1e-4, tol_optimal_init = 0.5, seed = 1)

str(mbkm)

```


Use the following link to report bugs/issues,

https://github.com/mlampros/ClusterR/issues


UPDATE 28-11-2019


Docker images of the ClusterR package are available to download from my dockerhub account. The images come with Rstudio and the R-development version (latest) installed. The whole process was tested on Ubuntu 18.04. To pull & run the image do the following,


```R

docker pull mlampros/clusterr:rstudiodev

docker run -d --name rstudiodev -e USER=rstudio -e PASSWORD=givehereyourpassword --rm -p 8787:8787 mlampros/clusterr:rstudiodev

```


The user can also bind a home directory / folder to the image to use its files by specifying the -v command,


```R

docker run -d --name rstudiodev -e USER=rstudio -e PASSWORD=givehereyourpassword --rm -p 8787:8787 -v /home/YOURDIR:/home/rstudio/YOURDIR mlampros/clusterr:rstudiodev

```


In the latter case you might have first give permission privileges for write access to YOUR_DIR directory (not necessarily) using,


```R

chmod -R 777 /home/YOUR_DIR

```


The USER defaults to rstudio but you have to give your PASSWORD of preference (see https://rocker-project.org/ for more information).


Open your web-browser and depending where the docker image was build / run give,


1st. Option on your personal computer,


```R http://0.0.0.0:8787

```


2nd. Option on a cloud instance,


```R http://Public DNS:8787

```


to access the Rstudio console in order to give your username and password.


Citation:

If you use the code of this repository in your paper or research please cite both ClusterR and the original articles / software https://CRAN.R-project.org/package=ClusterR:


R @Manual{, title = {{ClusterR}: Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering}, author = {Lampros Mouselimis}, year = {2024}, note = {R package version 1.3.3}, url = {https://CRAN.R-project.org/package=ClusterR}, }


Owner

  • Name: Lampros Mouselimis
  • Login: mlampros
  • Kind: user

Search (a little bit) and you'll find...

GitHub Events

Total
  • Issues event: 2
  • Watch event: 2
  • Issue comment event: 4
Last Year
  • Issues event: 2
  • Watch event: 2
  • Issue comment event: 4

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 130
  • Total Committers: 4
  • Avg Commits per committer: 32.5
  • Development Distribution Score (DDS): 0.4
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Lampros Mouselimis m****s@g****m 78
Lampros Mouselimis m****s@h****m 40
Vitalie Spinu s****t@g****m 11
Nick n****l@h****m 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 52
  • Total pull requests: 5
  • Average time to close issues: 9 days
  • Average time to close pull requests: about 1 month
  • Total issue authors: 33
  • Total pull request authors: 4
  • Average comments per issue: 4.52
  • Average comments per pull request: 9.2
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 5
  • Pull requests: 1
  • Average time to close issues: 3 days
  • Average time to close pull requests: 5 days
  • Issue authors: 4
  • Pull request authors: 1
  • Average comments per issue: 3.8
  • Average comments per pull request: 8.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • A-Pai (15)
  • FMKerckhof (3)
  • alazarolop (2)
  • hsbadr (2)
  • daniepi (2)
  • heavywatal (1)
  • NiharikaSinghal (1)
  • ghost (1)
  • davidgarridoreyes (1)
  • maronro (1)
  • mccurcio (1)
  • kadyb (1)
  • Melkiades (1)
  • ericsc7 (1)
  • elbamos (1)
Pull Request Authors
  • heavywatal (2)
  • vspinu (2)
  • nredell (1)
  • renovate[bot] (1)
Top Labels
Issue Labels
triage (24) help wanted (8) stale (7) bug (5) question (2) invalid (1)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 3,669 last-month
  • Total docker downloads: 44,625
  • Total dependent packages: 24
    (may contain duplicates)
  • Total dependent repositories: 42
    (may contain duplicates)
  • Total versions: 43
  • Total maintainers: 1
cran.r-project.org: ClusterR

Gaussian Mixture Models, K-Means, Mini-Batch-Kmeans, K-Medoids and Affinity Propagation Clustering

  • Versions: 33
  • Dependent Packages: 22
  • Dependent Repositories: 41
  • Downloads: 3,669 Last month
  • Docker Downloads: 44,625
Rankings
Forks count: 2.8%
Dependent packages count: 3.1%
Dependent repos count: 4.0%
Average: 4.6%
Stargazers count: 4.7%
Downloads: 5.7%
Docker downloads count: 7.4%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: r-clusterr
  • Versions: 10
  • Dependent Packages: 2
  • Dependent Repositories: 1
Rankings
Dependent packages count: 19.6%
Dependent repos count: 24.4%
Average: 28.4%
Forks count: 33.7%
Stargazers count: 35.9%
Last synced: 6 months ago