https://github.com/compnet/extmeaseval

Characterization of Partition Comparison Measures

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary

Keywords

cluster-analysis community-detection partition-comparison

Last synced: 5 months ago · JSON representation

Repository

Characterization of Partition Comparison Measures

Basic Info

Host: GitHub
Owner: CompNet
License: gpl-3.0
Language: R
Default Branch: main
Homepage:
Size: 50.8 KB

Statistics

Stars: 0
Watchers: 4
Forks: 0
Open Issues: 0
Releases: 0

Topics

cluster-analysis community-detection partition-comparison

Created over 4 years ago · Last pushed about 4 years ago

https://github.com/CompNet/ExtMeasEval/blob/main/

ExtMeasEval
===================
*Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection*

ExtMeasEval is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see licence.txt

* **Lab site:** http://lia.univ-avignon.fr
* **GitHub repo:** https://github.com/CompNet/Pang
* **Contact:** Nejat Arnk 

-------------------------------------------------------------------------

## Description
In the context of cluster analysis and graph partitioning, many external evaluation measures have been proposed in the literature to compare two partitions of the same set. This makes the task of selecting the most appropriate measure for a given situation a challenge for the end user. This is why we propose to solve this issue through a new empirical evaluation framework. For a collection of candidate measures, it first consists in describing their behavior by computing them for a generated dataset of partitions, obtained by applying a set of predefined parametric partition transformations. Second, our framework performs a regression analysis to characterize the measures in terms of how they are affected by these parameters and transformations. This allows both describing and comparing the measures. 

If you use this set of `R` scripts, please cite paper [[Arnk'21](#references)]:
```bibtex
@Article{Arinik2021,
  author    = {Arnk, Nejat and Figueiredo, Rosa and Labatut, Vincent},
  title     = {Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection},
  journal   = {IEEE Access},
  year      = {2021},
  volume    = {9},
  pages     = {20255-20276},
  doi       = {10.1109/access.2021.3054621},
}
```


## Data
We generate our data in a fully parametric way in order to get a greater control. See [[Arnk'21](#references)] for more details.

Our data, as well as our results and figures, are publicly available on [Zenodo](https://doi.org/10.5281/zenodo.6816128).


## Available evaluation measures
Note that unnormalized measures, such as MI and VI, should not be considered in our evaluation framework. Moreover, we slightly adjust the calculation of ARI in order that it gives a non-negative value in the range of [0,1]. Note that In the context of our experiments, it was always positive (no additional change was required). In practice it is very rare to get negative values for ARI, though.
* Information theoretical measures (See [[Vinh'09](#references)] and [[Meil'15](#references)] for more details)
  * Mutual Information (*MI*)
  * Variation of Information (*VI*) and its normalized version (NVI)
  * Several variants of Normalized Mutual Information (*NMIsum*, *NMIsqrt*, *NMIjoint*, *NMImax*, *NMImin*, *rNMI*)
  * Several variants of Adjusted Mutual Information (*AMIsqrt*, *AMIsum*, *AMImin*, *AMImax*)
* Pair-counting measures  (See [[Meil'15](#references)] for more details)
  * Rand Index (RI)
  * Adjustd Rand Index (ARI)
  * Jaccard Index (JI)
  * Mirkin Metric (MM)
  * Fowlkes-Mallows Index (FMI)
* Set-matching measure  (See [[Artiles'07](#references)], [[Meil'15](#references)] and [[Rezaei'16](#references)] for more details)
  * Split Join, also called Van Dongen Index (SJ)
  * F-measure (Fm)
  * Pair Sets Index (PSI)


## Organization
Here are the folders composing the project:
* Folder `src`: contains the source code (R scripts).
* Folder `out`: contains the folders and files produced by our scripts. See the *Use* section for more details.


## Installation
1. Install the [`R` language](https://www.r-project.org/)
2. Install the following R packages:
   * [`igraph`](http://igraph.org/r/) Tested with the version 1.2.6.
   * [`latex2exp`](https://cran.r-project.org/web/packages/latex2exp/index.html) Tested with the version 0.5.0.
   * [`ade4`](https://cran.r-project.org/web/packages/ade4/) Tested with the version 1.7.
   * [`genieclust` ](https://cran.r-project.org/web/packages/genieclust/)Tested with the version 1.0.0.
   * [`plot.matrix` ](https://cran.r-project.org/web/packages/plot.matrix/)Tested with the version 1.6.
   * [mclustcomp](https://cran.r-project.org/web/packages/mclustcomp/) Tested with the version 0.3.3.
   * [`clues`](https://cran.r-project.org/web/packages/clues/) Tested with the version 0.6.2.2.
   * [`NMF` ](https://cran.r-project.org/web/packages/NMF/)Tested with the version 0.23.0.
   * [`entropy`](https://cran.r-project.org/web/packages/entropy/) Tested with the version 1.3.0.


## Use
1. Open the `R` console.
3. Set the current directory as the working directory, using `setwd("")`.
4. Run the main script `src/main.R`.

The script will produce the following subfolders in the folder `out`:
* `partitions`: Folder containing all obtained transformed partitions.
* `evaluations`: Folder containing all obtained evaluation results.
* `data-frames`: Folder containing all csv files. There are as many csv files as the number of evaluation measures. These files contain the evaluation results.
* `regression-results`: Folder containing all the results regarding regression and relative importance analysis.
* `plots`: Folder containing all plots.


## References
* **[Arnk'21]** N. Arnk, V. Labatut and R. Figueiredo, "*Characterizing and Comparing  External Measures for the Assessment of Cluster Analysis and Community  Detection*", in *IEEE Access*, vol. 9, pp. 20255-20276, 2021. [hal-03124118](https://hal.archives-ouvertes.fr/hal-03124118) doi: [10.1109/ACCESS.2021.3054621](https://doi.org/10.1109/ACCESS.2021.3054621).
* **[Artiles'07]** J. Artiles, J. Gonzalo, and S. Sekine, "*The SemEval-2007 WePS evaluation: Establishing a benchmark for the Web people search task*", in Proc. 4th Int. Workshop Semantic Eval. (SemEval). Stroudsburg, PA, USA: Association for Computational Linguistics, 2007, pp. 6469.
* **[Vinh'09]** N. X. Vinh, J. Epps, and J. Bailey, "*Information theoretic measures for clusterings comparison: Is a correction for chance necessary?*" in Proc. 26th Annu. Int. Conf. Mach. Learn. (ICML). New York, NY, USA: ACM, 2009, pp. 10731080.
* **[Meil'15]** M. Meil, "*Criteria for comparing clusterings*", in Handbook of Cluster Analysis, 1st ed., C. Hennig, M. Meila, F. Murtagh, and R. Rocci, Eds. Boca Raton, FL, USA: CRC Press, 2015, ch. 27, pp. 619635.
* **[Rezaei'16]** M. Rezaei and P. Franti, "*Set matching measures for external cluster validity*", IEEE Trans. Knowl. Data Eng., 2016., vol. 28, no. 8, pp. 21732186.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/compnet/extmeaseval

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

https://github.com/CompNet/ExtMeasEval/blob/main/

Owner

GitHub Events

Total

Last Year