aricode

R package for computation of (adjusted) rand-index and other such scores

https://github.com/jchiquet/aricode

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 7 committers (14.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

bucket-sort clustering clustering-comparison-measures
Last synced: 9 months ago · JSON representation

Repository

R package for computation of (adjusted) rand-index and other such scores

Basic Info
Statistics
  • Stars: 26
  • Watchers: 4
  • Forks: 3
  • Open Issues: 0
  • Releases: 4
Topics
bucket-sort clustering clustering-comparison-measures
Created over 9 years ago · Last pushed over 2 years ago
Metadata Files
Readme Authors

README.Rmd

---
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  fig.path = "man/figures/"
)
```

# aricode

 
[![R-CMD-check](https://github.com/jchiquet/aricode/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jchiquet/aricode/actions/workflows/R-CMD-check.yaml)
[![CRAN Status](https://www.r-pkg.org/badges/version/aricode)](https://CRAN.R-project.org/package=aricode)
[![Coverage status](https://codecov.io/gh/jchiquet/aricode/branch/master/graph/badge.svg)](https://codecov.io/gh/jchiquet/aricode)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-blue.svg)](https://www.tidyverse.org/lifecycle/#stable)
[![](https://img.shields.io/github/last-commit/jchiquet/aricode.svg)](https://github.com/jchiquet/aricode/commits/master)
  

A package for efficient computations of standard clustering comparison measures

## Installation

Stable version on the [CRAN](https://cran.rstudio.com/web/packages/aricode/).

```{r install_cran, eval = FALSE}
install.packages("aricode")
```

The development version is available via:

```{r install_github, eval = FALSE}
devtools::install_github("jchiquet/aricode")
```

## Description

Computation of measures for clustering comparison (ARI, AMI, NID and even the $\chi^2$ distance) are usually based on the contingency table. Traditional implementations (e.g., function `adjustedRandIndex` of package **mclust**) are in $\Omega(n + u v)$ where 

- $n$ is the size of the vectors the classifications of which are to be compared,
- $u$ and $v$ are the respective number of classes in each vectors. 

In **aricode** we propose an implementation, based on radix sort, that is in $\Theta(n)$ in time and space.  
Importantly, the complexity does not depends on $u$ and $v$.
Our implementation of the ARI for instance is one or two order of magnitude faster than some standard implementation in `R`.

## Available measures and functions

The functions included in aricode are:

- `ARI`: computes the adjusted rand index
- `Chi2`: computes the Chi-square statistics
- `MARI/MARIraw`: computes the modified adjusted rand index (Sundqvist et al, in preparation)
- `NVI`: computes the the normalized variation information
- `NID`: computes the normalized information distance
- `NMI`: computes the normalized mutual information
- `AMI`: computes the adjusted mutual information
- `expected_MI`: computes the expected mutual information
- `entropy`: computes the conditional and joint entropies
- `clustComp`: computes all clustering comparison measures at once

## Timings

Here are some timings to compare the cost of computing the adjusted Rand Index with **aricode** or with the commonly used function `adjustedRandIndex` of the *mclust* package: the cost of the latter can be prohibitive for large vectors: 

```{r timings_function, echo=FALSE, message=FALSE, warning=FALSE}
library(aricode)
library(mclust)
library(ggplot2)

time.aricode <- function(times, c1, c2){
  replicate(times, system.time(ARI(c1, c2))[3])
}

time.mclust <- function(times, c1, c2){
  replicate(times, system.time(mclust::adjustedRandIndex(c1, c2))[3])
}

time.method <- function(times, c1, c2, n){
  rbind(
    data.frame(time = time.aricode(times, c1, c2), expr = "aricode", n = n),
    data.frame(time = time.mclust(times, c1, c2), expr = "mclust", n = n)
  )
}

# with similar classif, number of classes grows with n
sim.timings <- function(n, times = 10) {
    c1 <- sample(1:(n/200), n, replace=TRUE);c2 <- c1;
    i_change <- sample(1:n, n/50, replace=FALSE)
    c2[i_change] <- c2[rev(i_change)]
    out <- time.method(times, c1, c2, n)
    data.frame(time=out$time, method=out$expr, n = n)
}
```

```{r timings_run, echo=FALSE, message=FALSE, warning=FALSE, cache=TRUE}
# with similar classif, number of classes grows with n
ns <- sort(c(200 * 2^(3:14), 150 * 2^(3:15)))
timings <- do.call("rbind", lapply(ns, sim.timings))
```

```{r timings_plot, echo=FALSE, message=FALSE, warning=FALSE}
p.timings <- ggplot(timings, aes(x=n, y=time, colour=method)) +
  geom_smooth(data = dplyr::filter(timings, n > 1e4), method = "lm") + geom_point(size=0.25, alpha=0.9) + labs(y="time (sec.)") +
    scale_x_log10(
   breaks = scales::trans_breaks("log10", function(x) 10^x),
   labels = scales::trans_format("log10", scales::math_format(10^.x))
 ) +
 scale_y_log10(breaks = scales::trans_breaks("log10", function(x) 10^x),
   labels = scales::trans_format("log10", scales::math_format(10^.x))) +
   annotation_logticks()                 

p.timings + ggtitle("number of classes grows with n") + theme_bw()
```

Owner

  • Name: Julien Chiquet
  • Login: jchiquet
  • Kind: user
  • Location: Paris, France
  • Company: French National Institute of Agronomy (INRA)

Researcher in Statistics

GitHub Events

Total
  • Watch event: 2
  • Issue comment event: 1
  • Pull request event: 1
Last Year
  • Watch event: 2
  • Issue comment event: 1
  • Pull request event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 102
  • Total Committers: 7
  • Avg Commits per committer: 14.571
  • Development Distribution Score (DDS): 0.412
Past Year
  • Commits: 9
  • Committers: 1
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Julien Chiquet j****t@g****m 60
Julien Chiquet j****t@i****r 19
guillem.rigaill g****l@A****r 12
Julien Chiquet j****t@i****r 5
Julien Chiquet j****t@j****0 3
guillemr r****l@e****r 2
Darío Hereñú m****a@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 6 months
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • olivroy (2)
  • kant (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 1,793 last-month
  • Total docker downloads: 42,041
  • Total dependent packages: 9
  • Total dependent repositories: 19
  • Total versions: 6
  • Total maintainers: 1
cran.r-project.org: aricode

Efficient Computations of Standard Clustering Comparison Measures

  • Versions: 6
  • Dependent Packages: 9
  • Dependent Repositories: 19
  • Downloads: 1,793 Last month
  • Docker Downloads: 42,041
Rankings
Dependent packages count: 5.9%
Dependent repos count: 6.5%
Docker downloads count: 7.5%
Average: 7.8%
Downloads: 11.5%
Maintainers (1)
Last synced: 9 months ago

Dependencies

DESCRIPTION cran
  • Matrix * imports
  • Rcpp * imports
  • spelling * suggests
  • testthat * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action 4.1.4 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite