https://github.com/compnet/enumcc
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Keywords
correlation-clustering
graph-partitioning
operations-research
signed-graphs
Last synced: 5 months ago
·
JSON representation
Repository
Basic Info
Statistics
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
correlation-clustering
graph-partitioning
operations-research
signed-graphs
Created over 4 years ago
· Last pushed over 4 years ago
https://github.com/CompNet/EnumCC/blob/main/
EnumCC =================== Efficient Enumeration of Correlation Clustering Optimal Solution Space * Copyright 2020-21 Nejat Arnk *EnumCC* is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For source availability and license information see the file `LICENCE` * **Lab site:** http://lia.univ-avignon.fr/ * **GitHub repo:** https://github.com/CompNet/EnumCC * **Contact:** Nejat Arnk, Vincent Labatut ------------------------------------------------------------------------- ## Description `EnumCC` is an optimal solution space enumeration method for the *Correlation Clustering (CC)* problem. It relies on two essential tasks: Recurrent neighborhood search (*RNS*) and *jumping* onto an undiscovered solution. The former is performed by the component `RNSCC`, whereas the latter is done by the commercial solver `Cplex`. In the first step, instead of directly *jumping* onto undiscovered optimal solutions one by one through `Cplex`, as in a traditional sequential approach, its component `RNSCC` discovers the recurrent neighborhood of the current optimal solution *P* with the hope of discovering new optimal solutions. The recurrent neighborhood of an optimal solution *P*, represents the set of optimal solutions, reached directly or indirectly from *P* depending on the maximum distance parameter *maxNbEdit*. Whether a new solution is found or not through `RNSCC`, the jumping process into a new solution *P* is performed. If *P* is not empty, the workflow of `RNS` and *jumping* is repeated again. Otherwise, the enumeration process stops. See our article [[Arnk'23](#references)] for more details. If you use this software, please cite article [[Arnk'23](#references)]: ```bibtex @Article{Arinik2023, author = {Arnk, Nejat and Figueiredo, Rosa and Labatut, Vincent}, title = {Efficient Enumeration of the Optimal Solutions to the Correlation Clustering problem}, journal = {Journal of Global Optimization}, year = {2023}, volume = {86}, pages = {355-391}, doi = {10.1007/s10898-023-01270-3}, } ``` ## Input parameters * `inFile`: Input file path. See `in/exemple.G` for the input graph format. * `outDir`: Output directory path. Default `.` (i.e. the current directory). * `initMembershipFilePath`: The membership file path, from which the `RNSCC` starts. It must be an optimal solution of the given signed graph. Moreover, It must be named as `membership0.txt` or something different than `membership .txt`. See `out/exemple/membership0.txt` for its format. This file can be obtained through [ExCC](https://github.com/CompNet/ExCC) by running the script `run-cp-bb.sh`. * `java.library.path`: The `Cplex` Java library path. It is usually found in ` /cplex/lib/cplex.jar`. * `maxNbEdit`: The maximum value edit distance value to be considered in edit operations. We show in our experiments that `maxNbEdit=3` is usually more appropriate. * `tilim`: Time limit in seconds for the whole program. Default `-1`, which means no time limit. * `solLim`: Maximum number of optimal solutions to be discovered. This can be useful when there is a huge number of optimal solutions, e.g. 50,000. Default `-1`. * `JAR_filepath_RNSCC`: The jar file path for `RNSCC`. * `LPFilePath`: It allows to import a `Cplex` LP file, corresponding to a ILP formulation of a signed graph for the CC problem. *Remark:* Such a file is obtained through Cplex by doing `exportModel()`. This file can be obtained through [ExCC](https://github.com/CompNet/ExCC) by running the script `run-cp-bb.sh`. In `ExCC`, the name of this file is `strengthedModelAfterRootRelaxation.lp`. ## Instructions & Use ### Use 1 Install [`IBM CPlex`](https://www.ibm.com/docs/en/icos/20.1.0?topic=2010-installing-cplex-optimization-studio). The default installation location is: `/opt/ibm/ILOG/CPLEX_Studio `. Tested with `Cplex` 12.8 and 20.1. Put `/opt/ibm/ILOG/CPLEX_Studio /cplex/lib/cplex.jar` into the `lib` folder in this repository. Compile and get the jar file for *RNSCC*: `ant -v -buildfile build-rns.xml compile jar`. Compile and get the jar file for *EnumCC* `ant -v -buildfile build.xml compile jar`. We need a starting optimal solution and the ILP model of the given signed graph. We can obtain them by running the script `run-cp-bb.sh` in the [ExCC](https://github.com/CompNet/ExCC) repository. Run the script `run.sh`. ### Use 2 Put `/opt/ibm/ILOG/CPLEX_Studio /cplex/lib/cplex.jar` into the `lib` folder in this repository. Compile and get the jar file for `RNSCC`: `ant -v -buildfile build-rns.xml compile jar`. Compile and get the jar file for `EnumCC`: `ant -v -buildfile build.xml compile jar`. Download the [Sosocc](https://github.com/CompNet/Sosocc) repository and put these jar files into the `lib` folder. Then, run first `ExCC` and then `EnumCC(3)`. See the instructions of the `Sosocc` repository for more details. Example command: ``` ant -v -buildfile build.xml compile jar ant -v -buildfile build.xml -DinFile="in/example.G" -DoutDir="out/example" -DmaxNbEdit=3 -DinitMembershipFilePath="out/example/membership0.txt" -DLPFilePath="strengthedModelAfterRootRelaxation.lp" -DJAR_filepath_RNSCC="RNSCC.jar" -DnbThread=4 -Dtilim=-1 -DsolLim=5000 run ``` ## Output * ` `: Folder ` `, where `x is a numerical value starting from 1. Each folder contains the result of a `RNSCC` execution and possesses one or multiple optimal solutions. * `allResults.txt`: File storing all absolute paths of the discovered optimal solutions. * `exec-time.txt`: Execution time for the whole enumeration process. * `jump-exec-time .txt`: Execution time for the `x`.th jumping process through `Cplex`. * `jump-log .txt`: `Cplex` log file regarding the the `x`.th jumping process. * `jump-status .txt`: The `Cplex` status result in the end of the jumoing process. Three values are possible: `Optimal`, `SolLim`, `Infeasible`. * `membership .txt`: The starting membership file for the (`x+1`).th `RNSCC` process. ## References * **[Arnk'23]** N. Arnk & R. Figueiredo & V. Labatut. *Efficient Enumeration of the Optimal Solutions to the Correlation Clustering problem*, Journal of Global Optimization, 86:355-391, 2023. DOI: [10.1007/s10898-023-01270-3](http://doi.org/10.1007/s10898-023-01270-3) [hal-03935831](https://hal.archives-ouvertes.fr/hal-03935831)
Owner
- Name: Complex Networks
- Login: CompNet
- Kind: organization
- Location: Avignon, France
- Website: http://lia.univ-avignon.fr
- Repositories: 44
- Profile: https://github.com/CompNet