rust-kmedoids
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 10 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: won-bae
- License: gpl-3.0
- Language: Rust
- Default Branch: main
- Size: 145 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
k-Medoids Clustering in Rust with FasterPAM
This Rust crate implements k-medoids clustering with PAM and variants of clustering by direct optimization of the (Medoid) Silhouette. It can be used with arbitrary dissimilarites, as it requires a dissimilarity matrix as input.
This software package has been introduced in JOSS:
Erich Schubert and Lars Lenssen
Fast k-medoids Clustering in Rust and Python
Journal of Open Source Software 7(75), 4183
https://doi.org/10.21105/joss.04183 (open access)
For further details on the implemented algorithm FasterPAM, see:
Erich Schubert, Peter J. Rousseeuw
Fast and Eager k-Medoids Clustering:
O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms
Information Systems (101), 2021, 101804
https://doi.org/10.1016/j.is.2021.101804 (open access)
an earlier (slower, and now obsolete) version was published as:
Erich Schubert, Peter J. Rousseeuw:
Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
In: 12th International Conference on Similarity Search and Applications (SISAP 2019), 171-187.
https://doi.org/10.1007/978-3-030-32047-8_16
Preprint: https://arxiv.org/abs/1810.05691
This is a port of the original Java code from ELKI to Rust.
For further details on medoid Silhouette clustering with automatic cluster number selection (FasterMSC, DynMSC), see:
Lars Lenssen, Erich Schubert:
Medoid silhouette clustering with automatic cluster number selection
Information Systems (120), 2024, 102290
https://doi.org/10.1016/j.is.2023.102290
Preprint: https://arxiv.org/abs/2309.03751
the basic FasterMSC method was first published as:
Lars Lenssen, Erich Schubert:
Clustering by Direct Optimization of the Medoid Silhouette
In: 15th International Conference on Similarity Search and Applications (SISAP 2022)
https://doi.org/10.1007/978-3-031-17849-8_15
If you use this code in scientific work, please cite above papers. Thank you.
Example
let dissim = ndarray::arr2(&[[0,1,2,3],[1,0,4,5],[2,4,0,6],[3,5,6,0]]);
let mut meds = kmedoids::random_initialization(4, 2, &mut rand::thread_rng());
let (loss, assingment, n_iter, n_swap): (f64, _, _, _) = kmedoids::fasterpam(&dissim, &mut meds, 100);
println!("Loss is: {}", loss);
Note that:
- you need to specify the "output" data type of
loss-- chose a signed type with sufficient precision. For example for unsigned distances usingu32, it may be better to usei64to compute the loss. - the input distance type needs to be convertible into the output data type via
Into
Implemented Algorithms
- FasterPAM (Schubert and Rousseeuw, 2020, 2021)
- FasterPAM with an integrated additional shuffling step
- Parallelized FasterPAM with an integrated additional shuffling step
- FastPAM1 (Schubert and Rousseeuw, 2019, 2021)
- PAM (Kaufman and Rousseeuw, 1987) with BUILD and SWAP
- Alternating optimization (k-means-style algorithm)
- Silhouette index for evaluation (Rousseeuw, 1987)
- FasterMSC (Lenssen and Schubert, 2022)
- FastMSC (Lenssen and Schubert, 2022)
- DynMSC (Lenssen and Schubert, 2023)
- PAMSIL (Van der Laan and Pollard, 2003)
- PAMMEDSIL (Van der Laan and Pollard, 2003)
Note that the k-means-like algorithm for k-medoids tends to find much worse solutions.
The additional shuffling step for FasterPAM is beneficial if you intend to restart k-medoids multiple times on the same data (to find better solutions). The parallel implementation is typically faster when you have more than 5000 instances.
Rust Dependencies
- num-traits for supporting different numeric types
- ndarray for arrays (optional)
- rand for random initialization (optional)
- rayon for parallelization (optional)
Contributing to rust-kmedoids
Third-party contributions are welcome. Please use pull requests to submit patches.
Reporting issues
Please report errors as an issue within the repository's issue tracker.
Support requests
If you need help, please submit an issue within the repository's issue tracker.
License: GPL-3 or later
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Owner
- Name: Won Bae
- Login: won-bae
- Kind: user
- Location: Vancouver
- Company: University of British Columbia
- Website: won-bae.github.io
- Repositories: 5
- Profile: https://github.com/won-bae
PhD student
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Schubert
given-names: Erich
orcid: "https://orcid.org/0000-0001-9143-4880"
- family-names: Lenssen
given-names: Lars
orcid: "https://orcid.org/0000-0003-0037-0418"
title: "Fast k-medoids Clustering in Rust and Python"
journal: "J. Open Source Softw."
doi: 10.21105/joss.04183
version: 0.5.0
date-released: 2023-12-10
license: GPL-3.0
preferred-citation:
title: "Fast k-medoids Clustering in Rust and Python"
year: "2022"
type: article
journal: "J. Open Source Softw."
doi: 10.21105/joss.04183
authors:
- family-names: Schubert
given-names: Erich
- family-names: Lenssen
given-names: Lars
references:
- title: "Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms"
doi: "10.1007/978-3-030-32047-8_16"
year: "2019"
type: conference-paper
conference: "Similarity Search and Applications - 12th International Conference, SISAP 2019, Newark, NJ, USA, October 2-4, 2019, Proceedings"
authors:
- family-names: Schubert
given-names: Erich
- family-names: Rousseeuw
given-names: Peter J.
- title: "Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms"
doi: "10.1016/j.is.2021.101804"
year: "2021"
type: article
journal: "Inf. Syst."
volume: 101
start: 101804
authors:
- family-names: Schubert
given-names: Erich
- family-names: Rousseeuw
given-names: Peter J.
- title: "Clustering by Direct Optimization of the Medoid Silhouette"
doi: "10.1007/978-3-031-17849-8_15"
year: "2022"
type: conference-paper
conference: "Similarity Search and Applications - 15th International Conference, SISAP 2022, Bologna, Italy, October 5-7, 2022, Proceedings"
authors:
- family-names: Lenssen
given-names: Lars
- family-names: Schubert
given-names: Erich
- title: "Medoid silhouette clustering with automatic cluster number selection"
doi: "10.1016/j.is.2023.102290"
year: "2024"
type: article
journal: "Inf. Syst."
volume: 120
start: 102290
authors:
- family-names: Lenssen
given-names: Lars
- family-names: Schubert
given-names: Erich
GitHub Events
Total
Last Year
Dependencies
- byteorder 1.5 development
- ndarray 0.15 development
- rand 0.8 development
- ndarray 0.15
- num-traits 0.2
- rand 0.8
- rayon 1.8