densityClust

Clustering by fast search and find of density peaks

https://github.com/thomasp85/densityclust

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    3 of 5 committers (60.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Clustering by fast search and find of density peaks

Basic Info
  • Host: GitHub
  • Owner: thomasp85
  • Language: R
  • Default Branch: main
  • Size: 3.06 MB
Statistics
  • Stars: 160
  • Watchers: 19
  • Forks: 66
  • Open Issues: 3
  • Releases: 2
Created almost 12 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog Code of conduct

README.Rmd

---
output: github_document
---

# Clustering by fast search and find of density peaks


[![R-CMD-check](https://github.com/thomasp85/densityClust/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/thomasp85/densityClust/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/thomasp85/densityClust/branch/main/graph/badge.svg)](https://app.codecov.io/gh/thomasp85/densityClust?branch=main)
[![CRAN\_Release\_Badge](http://www.r-pkg.org/badges/version-ago/densityClust)](https://CRAN.R-project.org/package=densityClust)
[![CRAN\_Download\_Badge](http://cranlogs.r-pkg.org/badges/densityClust)](https://CRAN.R-project.org/package=densityClust)
 

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

This package implement the clustering algorithm described by Alex Rodriguez and Alessandro Laio (2014). It provides the user with tools for generating the initial rho and delta values for each observation as well as using these to assign observations to clusters. This is done in two passes so the user is free to reassign observations to clusters using a new set of rho and delta thresholds, without needing to recalculate everything.

## Plotting
Two types of plots are supported by this package, and both mimics the types of plots used in the publication for the algorithm. The standard plot function produces a decision plot, with optional colouring of cluster peaks if these are assigned. Furthermore `plotMDS()` performs a multidimensional scaling of the distance matrix and plots this as a scatterplot. If clusters are assigned observations are coloured according to their assignment.

## Cluster detection
The two main functions for this package are `densityClust()` and `findClusters()`. The former takes a distance matrix and optionally a distance cutoff and calculates rho and delta for each observation. The latter takes the output of `densityClust()` and make cluster assignment for each observation based on a user defined rho and delta threshold. If the thresholds are not specified the user is able to supply them interactively by clicking on a decision plot.

## Usage
```{r}
library(densityClust)
irisDist <- dist(iris[,1:4])
irisClust <- densityClust(irisDist, gaussian=TRUE)
plot(irisClust) # Inspect clustering attributes to define thresholds

irisClust <- findClusters(irisClust, rho=2, delta=2)
plotMDS(irisClust)
split(iris[,5], irisClust$clusters)
```

Note that while the iris dataset contains information on three different species of iris, only two clusters are detected by the algorithm. This is because two of the species (versicolor and virginica) are not clearly seperated by their data.

## Refences
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. https://doi.org/10.1126/science.1242072

Owner

  • Name: Thomas Lin Pedersen
  • Login: thomasp85
  • Kind: user
  • Location: Copenhagen
  • Company: @posit-pbc, part of @tidyverse team

Maker of tools focusing on data science and data visualisation

GitHub Events

Total
  • Watch event: 8
  • Fork event: 1
Last Year
  • Watch event: 8
  • Fork event: 1

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 55
  • Total Committers: 5
  • Avg Commits per committer: 11.0
  • Development Distribution Score (DDS): 0.236
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Thomas Lin Pedersen t****5@g****m 42
Sean Hughes s****s@u****u 10
Xiaojie Qiu x****u@u****u 1
Eric Archer e****r@n****v 1
Devon Ryan d****9 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 10
  • Total pull requests: 9
  • Average time to close issues: 8 months
  • Average time to close pull requests: 6 months
  • Total issue authors: 9
  • Total pull request authors: 6
  • Average comments per issue: 0.9
  • Average comments per pull request: 7.56
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jk86754 (2)
  • Liang-Wu-01 (1)
  • vnijs (1)
  • EricArcher (1)
  • abcosta (1)
  • tyluckyma (1)
  • bastistician (1)
  • thomasp85 (1)
  • anish-mm (1)
Pull Request Authors
  • EricArcher (3)
  • Xiaojieqiu (2)
  • seaaan (1)
  • thomasp85 (1)
  • dpryan79 (1)
  • jdmanton (1)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 815 last-month
  • Total docker downloads: 13,666
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 3
    (may contain duplicates)
  • Total versions: 9
  • Total maintainers: 1
cran.r-project.org: densityClust

Clustering by Fast Search and Find of Density Peaks

  • Versions: 6
  • Dependent Packages: 1
  • Dependent Repositories: 3
  • Downloads: 815 Last month
  • Docker Downloads: 13,666
Rankings
Forks count: 1.0%
Stargazers count: 2.8%
Average: 12.4%
Downloads: 14.9%
Dependent repos count: 16.5%
Dependent packages count: 18.1%
Docker downloads count: 21.0%
Maintainers (1)
Last synced: 10 months ago
conda-forge.org: r-densityclust
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Average: 42.6%
Dependent packages count: 51.2%
Last synced: 10 months ago

Dependencies

DESCRIPTION cran
  • FNN * imports
  • RColorBrewer * imports
  • Rcpp * imports
  • Rtsne * imports
  • ggplot2 * imports
  • ggrepel * imports
  • grDevices * imports
  • gridExtra * imports
  • covr * suggests
  • testthat * suggests