ClusterValidityIndices.jl

ClusterValidityIndices.jl: Batch and Incremental Metrics for Unsupervised Learning - Published in JOSS (2022)

https://github.com/ap6yc/clustervalidityindices.jl

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

clustering julia julia-language machine-learning

Keywords from Contributors

pde interpretability meshing fluxes standardization
Last synced: 6 months ago · JSON representation ·

Repository

A Julia package for Cluster Validity Indices (CVIs).

Basic Info
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 1
  • Open Issues: 3
  • Releases: 21
Topics
clustering julia julia-language machine-learning
Created almost 5 years ago · Last pushed almost 3 years ago
Metadata Files
Readme Contributing Funding License Code of conduct Citation

README.md

clustervalidityindices-header

A Julia package for Cluster Validity Indices (CVI) algorithms.

| Documentation | Build Status | Coverage | Reference | |:------------------:|:----------------:|:------------:|:-------------:| | Stable | Build Status | Codecov | DOI | | Dev | Build Status | Coveralls | DOI | | Documentation Build | JuliaHub Status | Dependents | Release | | Documentation | pkgeval | deps | version |

Please read the documentation for detailed usage and tutorials.

Table of Contents

Overview

Cluster Validity Indices (CVIs) are designed to be metrics of performance for unsupervised clustering algorithms. In the absense of supervisory labels (i.e., ground truth), clustering algorithms - or any truly unsupervised learning algorithms - have no way to definitively know the stability of their learning and accuracy of their performance. As a result, CVIs exist to provide metrics of partitioning stability/validity through the use of only the original data samples and the cluster labels prescribed by the clustering algorithm.

This Julia package contains an outline of the conceptual usage of CVIs along with many example scripts in the documentation. This outline contains a Quickstart that provides an overview of how to use this project along with a list of CVIs that are implemented in the lastest version of the project.

Installation

This project is distributed as a Julia package and hosted on JuliaHub, Julia's package manager repository. As such, this package's usage follows the usual Julia package installation procedure, interactively:

julia-repl julia> ] (@v1.9) pkg> add ClusterValidityIndices

or programmatically:

julia-repl julia> using Pkg julia> Pkg.add("ClusterValidityIndices")

You may also add the package directly from a GitHub branch to get the latest changes between releases:

julia-repl julia> ] (@v1.9) pkg> add https://github.com/AP6YC/ClusterValidityIndices.jl#develop

Quickstart

This section provides a quick overview of how to use the project. For more detailed code usage, please see the Detailed Usage.

First, import the package with:

```julia

Import the package

using ClusterValidityIndices ```

CVI objects are instantiated with empty constructors:

```julia

Create a Davies-Bouldin (DB) CVI object

my_cvi = DB() ```

All CVIs are implemented with acronyms of their literature names. A list of all of these are found in the Implemented CVIs/ICVIs section.

Next, get data from a clustering process. This is a set of samples of features that are clustered and prescribed cluster labels.

Note

The ClusterValidityIndices.jl package assumes data to be in the form of Float matrices where columns are samples and rows are features. An individual sample is a single vector of features. Labels are vectors of integers where each number corresponds to its own cluster.

```julia

Random data as an example; 10 samples with feature dimenison 3

dim = 3 nsamples = 10 data = rand(dim, nsamples) labels = repeat(1:2, inner=n_samples) ```

The output of CVIs are called criterion values, and they can be computed both incrementally and in batch with get_cvi!. Compute in batch by providing a matrix of samples and a vector of labels:

julia criterion_value = get_cvi!(my_cvi, data, labels)

or incrementally with the same function by passing one sample and label at a time:

```julia

Create a fresh CVI object for incremental evaluation

my_icvi = DB()

Create a container for the values and iterate

criterionvalues = zeros(nsamples) for i = 1:nsamples criterionvalues[i] = getcvi!(myicvi, data[:, i], labels[i]) end ```

Note

Each module has a batch and incremental implementation, but ClusterValidityIndices.jl does not yet support switching between batch and incremental modes with the same CVI object.

Implemented CVI/ICVIs

This project has implementations of the following CVIs in both batch and incremental variants:

  • CH: Calinski-Harabasz.
  • cSIL: Centroid-based Silhouette.
  • DB: Davies-Bouldin.
  • GD43: Generalized Dunn's Index 43.
  • GD53: Generalized Dunn's Index 53.
  • PS: Partition Separation.
  • rCIP: (Renyi's) representative Cross Information Potential.
  • WB: WB-index.
  • XB: Xie-Beni.

The exported constant CVI_MODULES also contains a list of these CVIs for convenient iteration.

Examples

A basic example of the package usage is found in the documentation illustrating top-down usage of the package.

Futhermore, there are a variety of examples in the Examples section of the documentation for a variety of use cases of the project. Each of these is made using the DemoCards.jl package and can be opened, saved, and run as a Julia notebook.

Contributing

If you have a question or concern, please raise an issue. For more details on how to work with the project, propose changes, or even contribute code, please see the Developer Notes in the project's documentation.

In summary:

  1. Questions and requested changes should all be made in the issues page. These are preferred because they are publicly viewable and could assist or educate others with similar issues or questions.
  2. For changes, this project accepts pull requests (PRs) from feature/<my-feature> branches onto the develop branch using the GitFlow methodology. If unit tests pass and the changes are beneficial, these PRs are merged into develop and eventually folded into versioned releases.
  3. The project follows the Semantic Versioning convention of major.minor.patch incremental versioning numbers. Patch versions are for bug fixes, minor versions are for backward-compatible changes, and major versions are for new and incompatible usage changes.

Acknowledgements

Authors

This package is developed and maintained by Sasha Petrenko with sponsorship by the Applied Computational Intelligence Laboratory (ACIL). The users @rMassimiliano and @malmaud have graciously contributed their time with reviews and feedback that has greatly improved the project.

Support

This project is supported by grants from the Night Vision Electronic Sensors Directorate, the DARPA Lifelong Learning Machines (L2M) program, Teledyne Technologies, and the National Science Foundation. The material, findings, and conclusions here do not necessarily reflect the views of these entities.

Research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-22-2-0209. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

License

This software is openly maintained by the ACIL of the Missouri University of Science and Technology under the MIT License.

Citation

This project has a citation file file that generates citation information for the package and corresponding JOSS paper, which can be accessed at the "Cite this repository button" under the "About" section of the GitHub page.

You may also cite this repository with the following BibTeX entry:

bibtex @article{Petrenko2022, doi = {10.21105/joss.03527}, url = {https://doi.org/10.21105/joss.03527}, year = {2022}, publisher = {The Open Journal}, volume = {7}, number = {79}, pages = {3527}, author = {Sasha Petrenko and Donald C. Wunsch}, title = {ClusterValidityIndices.jl: Batch and Incremental Metrics for Unsupervised Learning}, journal = {Journal of Open Source Software} }

Owner

  • Name: Sasha Petrenko
  • Login: AP6YC
  • Kind: user

Graduate researcher of applied computational intelligence at the Missouri University of Science and Technology.

JOSS Publication

ClusterValidityIndices.jl: Batch and Incremental Metrics for Unsupervised Learning
Published
November 25, 2022
Volume 7, Issue 79, Page 3527
Authors
Sasha Petrenko ORCID
Missouri University of Science and Technology
Donald C. Wunsch ORCID
Missouri University of Science and Technology
Editor
Adi Sinn ORCID
Tags
CVI ICVI Cluster Validity Indices Cluster Validity Index Incremental Cluster Validity Indices Incremental Cluster Validity Index Machine Learning Clustering Metrics Streaming Time Series

Citation (CITATION.cff)

# CFF version for the document
cff-version: 1.2.0

# Authors list
authors:
  - family-names: "Petrenko"
    given-names: "Sasha"
    orcid: "https://orcid.org/0000-0003-2442-8901"
    website: "https://ap6yc.github.io/"
    email: "sap625@mst.edu"
    alias: "AP6YC"
    affiliation: "Missouri University of Science and Technology"

# Repository title and descriptors
title: "AP6YC/ClusterValidityIndices.jl"
abstract: "This software is a Julia package for incremental and batch Cluster Validity Indices (CVI)."
keywords:
  - "CVI"
  - "ICVI"
  - "Cluster Validity Indices"
  - "Incremental Cluster Validity Indices"
identifiers:
  - description: "The DOI of the latest ClusterValidityIndices.jl Zenodo archive."
    type: "doi"
    value: "10.5281/zenodo.5765807"
url: "https://doi.org/10.5281/zenodo.5765807"
repository-code: "https://github.com/AP6YC/AdaptiveResonance.jl"
license: "MIT"
institution:
  name: "Missouri University of Science and Technology"

# Preferred citation of the JOSS paper
message: "Please cite this software using the metadata from 'preferred-citation'."
preferred-citation:

  # Authors list for the JOSS paper
  authors:
    - family-names: "Petrenko"
      given-names: "Sasha"
      orcid: "https://orcid.org/0000-0003-2442-8901"
      website: "https://ap6yc.github.io/"
      email: "sap625@mst.edu"
      alias: "AP6YC"
      affiliation: "Missouri University of Science and Technology"
    - family-names: "Wunsch"
      given-names: "Donald"
      name-suffix: "II"
      orcid: "https://orcid.org/0000-0002-9726-9051"
      website: "https://people.mst.edu/faculty/dwunsch/"
      email: "dwunsch@mst.edu"
      alias: "dwunsch"
      affiliation: "Missouri University of Science and Technology"

  # Title, DOI, and journal details for the JOSS paper
  title: "ClusterValidityIndices.jl: Batch and Incremental Metrics for Unsupervised Learning"
  publisher: "The Open Journal"
  journal: "Journal of Open Source Software"
  year: 2022
  month: 11
  volume: 7
  number: 79
  pages: 3527
  type: "article"
  identifiers:
    - description: "The DOI of the ClusterValidityIndices.jl JOSS paper."
      type: "doi"
      value: "10.21105/joss.03527"
  url: "https://doi.org/10.21105/joss.03527"
  institution:
    name: "Missouri University of Science and Technology"

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 334
  • Total Committers: 3
  • Avg Commits per committer: 111.333
  • Development Distribution Score (DDS): 0.021
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Sasha Petrenko s****5@u****u 327
CompatHelper Julia c****y@j****g 6
github-actions[bot] 4****] 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 40
  • Total pull requests: 48
  • Average time to close issues: 26 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 0.9
  • Average comments per pull request: 0.96
  • Merged pull requests: 47
  • Bot issues: 0
  • Bot pull requests: 8
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • AP6YC (37)
  • urlicht (1)
  • JuliaTagBot (1)
Pull Request Authors
  • AP6YC (40)
  • github-actions[bot] (8)
Top Labels
Issue Labels
enhancement (18) documentation (16) bug (13) paper (3) duplicate (1)
Pull Request Labels
enhancement (18) documentation (16) bug (9) release (5) paper (4)

Packages

  • Total packages: 1
  • Total downloads:
    • julia 1 total
  • Total dependent packages: 3
  • Total dependent repositories: 0
  • Total versions: 21
juliahub.com: ClusterValidityIndices

A Julia package for Cluster Validity Indices (CVIs).

  • Versions: 21
  • Dependent Packages: 3
  • Dependent Repositories: 0
  • Downloads: 1 Total
Rankings
Dependent repos count: 9.9%
Dependent packages count: 16.6%
Average: 30.6%
Stargazers count: 42.3%
Forks count: 53.5%
Last synced: 6 months ago

Dependencies

.github/workflows/CI.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • codecov/codecov-action v3 composite
  • coverallsapp/github-action master composite
  • julia-actions/julia-buildpkg latest composite
  • julia-actions/julia-processcoverage v1 composite
  • julia-actions/julia-runtest latest composite
  • julia-actions/setup-julia v1 composite
  • styfle/cancel-workflow-action 0.11.0 composite
.github/workflows/Documentation.yml actions
  • actions/checkout v2 composite
  • julia-actions/setup-julia latest composite
  • styfle/cancel-workflow-action 0.9.1 composite
.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite
.github/workflows/draft-pdf.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
  • styfle/cancel-workflow-action 0.6.0 composite
.github/workflows/CompatHelper.yml actions