https://github.com/compnet/extmeaseval
Characterization of Partition Comparison Measures
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary
Keywords
cluster-analysis
community-detection
partition-comparison
Last synced: 5 months ago
·
JSON representation
Repository
Characterization of Partition Comparison Measures
Basic Info
Statistics
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
cluster-analysis
community-detection
partition-comparison
Created over 4 years ago
· Last pushed about 4 years ago
https://github.com/CompNet/ExtMeasEval/blob/main/
ExtMeasEval =================== *Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection* ExtMeasEval is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see licence.txt * **Lab site:** http://lia.univ-avignon.fr * **GitHub repo:** https://github.com/CompNet/Pang * **Contact:** Nejat Arnk------------------------------------------------------------------------- ## Description In the context of cluster analysis and graph partitioning, many external evaluation measures have been proposed in the literature to compare two partitions of the same set. This makes the task of selecting the most appropriate measure for a given situation a challenge for the end user. This is why we propose to solve this issue through a new empirical evaluation framework. For a collection of candidate measures, it first consists in describing their behavior by computing them for a generated dataset of partitions, obtained by applying a set of predefined parametric partition transformations. Second, our framework performs a regression analysis to characterize the measures in terms of how they are affected by these parameters and transformations. This allows both describing and comparing the measures. If you use this set of `R` scripts, please cite paper [[Arnk'21](#references)]: ```bibtex @Article{Arinik2021, author = {Arnk, Nejat and Figueiredo, Rosa and Labatut, Vincent}, title = {Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection}, journal = {IEEE Access}, year = {2021}, volume = {9}, pages = {20255-20276}, doi = {10.1109/access.2021.3054621}, } ``` ## Data We generate our data in a fully parametric way in order to get a greater control. See [[Arnk'21](#references)] for more details. Our data, as well as our results and figures, are publicly available on [Zenodo](https://doi.org/10.5281/zenodo.6816128). ## Available evaluation measures Note that unnormalized measures, such as MI and VI, should not be considered in our evaluation framework. Moreover, we slightly adjust the calculation of ARI in order that it gives a non-negative value in the range of [0,1]. Note that In the context of our experiments, it was always positive (no additional change was required). In practice it is very rare to get negative values for ARI, though. * Information theoretical measures (See [[Vinh'09](#references)] and [[Meil'15](#references)] for more details) * Mutual Information (*MI*) * Variation of Information (*VI*) and its normalized version (NVI) * Several variants of Normalized Mutual Information (*NMIsum*, *NMIsqrt*, *NMIjoint*, *NMImax*, *NMImin*, *rNMI*) * Several variants of Adjusted Mutual Information (*AMIsqrt*, *AMIsum*, *AMImin*, *AMImax*) * Pair-counting measures (See [[Meil'15](#references)] for more details) * Rand Index (RI) * Adjustd Rand Index (ARI) * Jaccard Index (JI) * Mirkin Metric (MM) * Fowlkes-Mallows Index (FMI) * Set-matching measure (See [[Artiles'07](#references)], [[Meil'15](#references)] and [[Rezaei'16](#references)] for more details) * Split Join, also called Van Dongen Index (SJ) * F-measure (Fm) * Pair Sets Index (PSI) ## Organization Here are the folders composing the project: * Folder `src`: contains the source code (R scripts). * Folder `out`: contains the folders and files produced by our scripts. See the *Use* section for more details. ## Installation 1. Install the [`R` language](https://www.r-project.org/) 2. Install the following R packages: * [`igraph`](http://igraph.org/r/) Tested with the version 1.2.6. * [`latex2exp`](https://cran.r-project.org/web/packages/latex2exp/index.html) Tested with the version 0.5.0. * [`ade4`](https://cran.r-project.org/web/packages/ade4/) Tested with the version 1.7. * [`genieclust` ](https://cran.r-project.org/web/packages/genieclust/)Tested with the version 1.0.0. * [`plot.matrix` ](https://cran.r-project.org/web/packages/plot.matrix/)Tested with the version 1.6. * [mclustcomp](https://cran.r-project.org/web/packages/mclustcomp/) Tested with the version 0.3.3. * [`clues`](https://cran.r-project.org/web/packages/clues/) Tested with the version 0.6.2.2. * [`NMF` ](https://cran.r-project.org/web/packages/NMF/)Tested with the version 0.23.0. * [`entropy`](https://cran.r-project.org/web/packages/entropy/) Tested with the version 1.3.0. ## Use 1. Open the `R` console. 3. Set the current directory as the working directory, using `setwd(" ")`. 4. Run the main script `src/main.R`. The script will produce the following subfolders in the folder `out`: * `partitions`: Folder containing all obtained transformed partitions. * `evaluations`: Folder containing all obtained evaluation results. * `data-frames`: Folder containing all csv files. There are as many csv files as the number of evaluation measures. These files contain the evaluation results. * `regression-results`: Folder containing all the results regarding regression and relative importance analysis. * `plots`: Folder containing all plots. ## References * **[Arnk'21]** N. Arnk, V. Labatut and R. Figueiredo, "*Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection*", in *IEEE Access*, vol. 9, pp. 20255-20276, 2021. [hal-03124118](https://hal.archives-ouvertes.fr/hal-03124118) doi: [10.1109/ACCESS.2021.3054621](https://doi.org/10.1109/ACCESS.2021.3054621). * **[Artiles'07]** J. Artiles, J. Gonzalo, and S. Sekine, "*The SemEval-2007 WePS evaluation: Establishing a benchmark for the Web people search task*", in Proc. 4th Int. Workshop Semantic Eval. (SemEval). Stroudsburg, PA, USA: Association for Computational Linguistics, 2007, pp. 6469. * **[Vinh'09]** N. X. Vinh, J. Epps, and J. Bailey, "*Information theoretic measures for clusterings comparison: Is a correction for chance necessary?*" in Proc. 26th Annu. Int. Conf. Mach. Learn. (ICML). New York, NY, USA: ACM, 2009, pp. 10731080. * **[Meil'15]** M. Meil, "*Criteria for comparing clusterings*", in Handbook of Cluster Analysis, 1st ed., C. Hennig, M. Meila, F. Murtagh, and R. Rocci, Eds. Boca Raton, FL, USA: CRC Press, 2015, ch. 27, pp. 619635. * **[Rezaei'16]** M. Rezaei and P. Franti, "*Set matching measures for external cluster validity*", IEEE Trans. Knowl. Data Eng., 2016., vol. 28, no. 8, pp. 21732186.
Owner
- Name: Complex Networks
- Login: CompNet
- Kind: organization
- Location: Avignon, France
- Website: http://lia.univ-avignon.fr
- Repositories: 44
- Profile: https://github.com/CompNet