metacluster

MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems

https://github.com/thieu1995/metacluster

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.2%) to scientific vocabulary

Keywords

classification clustering-methods genetic-algorithm global-search k-center-problem kmeans kmeans-clustering mealpy metaheuristic-based-clustering particle-swarm-optimization unsupervised-learning whale-optimization-algorithm
Last synced: 4 months ago · JSON representation ·

Repository

MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems

Basic Info
Statistics
  • Stars: 14
  • Watchers: 1
  • Forks: 4
  • Open Issues: 0
  • Releases: 7
Topics
classification clustering-methods genetic-algorithm global-search k-center-problem kmeans kmeans-clustering mealpy metaheuristic-based-clustering particle-swarm-optimization unsupervised-learning whale-optimization-algorithm
Created over 2 years ago · Last pushed 5 months ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

MetaCluster


GitHub release Wheel PyPI version PyPI - Python Version PyPI - Status Downloads Tests & Publishes to PyPI GitHub Release Date Documentation Status Chat GitHub contributors GitTutorial DOI License: GPL v3

MetaCluster is the largest open-source nature-inspired optimization (Metaheuristic Algorithms) library for clustering problem in Python

  • Free software: GNU General Public License (GPL) V3 license
  • Provided 3 classes: MetaCluster, MhaKCentersClustering, and MhaKMeansTuner
  • Total nature-inspired metaheuristic optimizers (Metaheuristic Algorithms): > 200 optimizers
  • Total objective functions (as fitness): > 40 objectives
  • Total supported datasets: 48 datasets from Scikit learn, UCI, ELKI, KEEL...
  • Total performance metrics: > 40 metrics
  • Total different way of detecting the K value: >= 10 methods
  • Documentation: https://metacluster.readthedocs.io/en/latest/
  • Python versions: >= 3.7.x
  • Dependencies: numpy, scipy, scikit-learn, pandas, mealpy, permetrics, plotly, kaleido

Citation Request

Please include these citations if you plan to use this library:

```code @article{VanThieu2023, author = {Van Thieu, Nguyen and Oliva, Diego and Pérez-Cisneros, Marco}, title = {MetaCluster: An open-source Python library for metaheuristic-based clustering problems}, journal = {SoftwareX}, year = {2023}, pages = {101597}, volume = {24}, DOI = {10.1016/j.softx.2023.101597}, }

@article{van2023mealpy, title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python}, author={Van Thieu, Nguyen and Mirjalili, Seyedali}, journal={Journal of Systems Architecture}, year={2023}, publisher={Elsevier}, doi={10.1016/j.sysarc.2023.102871} } ```

Installation

After installation, check the version: ```bash $ python

import metacluster metacluster.version ```

Examples

We implement a dedicated Github repository for examples at MetaCluster_examples

Let's go through some basic examples from here:

1. First, load dataset. You can use the available datasets from MetaCluster:

```python

Load available dataset from MetaCluster

from metacluster import get_dataset

Try unknown data

get_dataset("unknown")

Enter: 1 -> This wil list all of avaialble dataset

data = get_dataset("Arrhythmia") ```

  • Or you can load your own dataset

```python import pandas as pd from metacluster import Data

load X and y

NOTE MetaCluster accepts numpy arrays only, hence use the .values attribute

dataset = pd.readcsv('examples/dataset.csv', indexcol=0).values X, y = dataset[:, 0:-1], dataset[:, -1] data = Data(X, y, name="my-dataset") ```

2. Next, scale your features

You should confirm that your dataset is scaled and normalized

```python

MinMaxScaler

data.X, scaler = data.scale(data.X, method="MinMaxScaler", feature_range=(0, 1))

StandardScaler

data.X, scaler = data.scale(data.X, method="StandardScaler")

MaxAbsScaler

data.X, scaler = data.scale(data.X, method="MaxAbsScaler")

RobustScaler

data.X, scaler = data.scale(data.X, method="RobustScaler")

Normalizer

data.X, scaler = data.scale(data.X, method="Normalizer", norm="l2") # "l1" or "l2" or "max" ```

3. Next, select Metaheuristic Algorithm, Its parameters, list of objectives, and list of performance metrics

python list_optimizer = ["BaseFBIO", "OriginalGWO", "OriginalSMA"] list_paras = [ {"name": "FBIO", "epoch": 10, "pop_size": 30}, {"name": "GWO", "epoch": 10, "pop_size": 30}, {"name": "SMA", "epoch": 10, "pop_size": 30} ] list_obj = ["SI", "RSI"] list_metric = ["BHI", "DBI", "DI", "CHI", "SSEI", "NMIS", "HS", "CS", "VMS", "HGS"]

You can check all supported metaheuristic algorithms from: https://github.com/thieu1995/mealpy. All supported clustering objectives and metrics from: https://github.com/thieu1995/permetrics.

If you don't want to read the documents, you can print out all supported information by:

```python from metacluster import MetaCluster

Get all supported methods and print them out

MetaCluster.get_support(name="all") ```

4. Next, create an instance of MetaCluster class and run it.

```python model = MetaCluster(listoptimizer=listoptimizer, listparas=listparas, listobj=listobj, n_trials=3, seed=10)

model.execute(data=data, clusterfinder="elbow", listmetric=listmetric, savepath="history", verbose=False)

model.saveboxplots() model.saveconvergences() ```

As you can see, you can define different datasets and using the same model to run it. Remember to set the name to your dataset, because the folder that hold your results is the name of your dataset. More examples can be found here

Support

Official links (questions, problems)

  • Official source code repo: https://github.com/thieu1995/metacluster
  • Official document: https://metacluster.readthedocs.io/
  • Download releases: https://pypi.org/project/metacluster/
  • Issue tracker: https://github.com/thieu1995/metacluster/issues
  • Notable changes log: https://github.com/thieu1995/metacluster/blob/master/ChangeLog.md
  • Official chat group: https://t.me/+fRVCJGuGJg1mNDg1

  • This project also related to our another projects which are optimization and machine learning. Check it here:

    • https://github.com/thieu1995/metaheuristics
    • https://github.com/thieu1995/mealpy
    • https://github.com/thieu1995/mafese
    • https://github.com/thieu1995/pfevaluator
    • https://github.com/thieu1995/opfunu
    • https://github.com/thieu1995/enoppy
    • https://github.com/thieu1995/permetrics
    • https://github.com/thieu1995/IntelELM
    • https://github.com/thieu1995/MetaPerceptron
    • https://github.com/thieu1995/GrafoRVFL
    • https://github.com/aiir-team

Supported links

code 1. https://jtemporal.com/kmeans-and-elbow-method/ 2. https://medium.com/@masarudheena/4-best-ways-to-find-optimal-number-of-clusters-for-clustering-with-python-code-706199fa957c 3. https://github.com/minddrummer/gap/blob/master/gap/gap.py 4. https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101 5. https://doi.org/10.1016/j.engappai.2018.03.013 6. https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/Clustering_metrics.ipynb 7. https://elki-project.github.io/ 8. https://sci2s.ugr.es/keel/index.php 9. https://archive.ics.uci.edu/datasets 10. https://python-charts.com/distribution/box-plot-plotly/ 11. https://plotly.com/python/box-plots/?_ga=2.50659434.2126348639.1688086416-114197406.1688086416#box-plot-styling-mean--standard-deviation

Owner

  • Name: Nguyen Van Thieu
  • Login: thieu1995
  • Kind: user
  • Location: Earth
  • Company: AIIR Group

Knowledge is power, sharing it is the premise of progress in life. It seems like a burden to someone, but it is the only way to achieve immortality.

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Van Thieu"
    given-names: "Nguyen"
    orcid: "https://orcid.org/0000-0001-9994-8747"
title: "MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems"
version: 1.3.0
doi: 10.5281/zenodo.8214539
date-released: 2023-08-26
url: "https://github.com/thieu1995/metacluster"

GitHub Events

Total
  • Watch event: 3
  • Push event: 5
  • Fork event: 1
Last Year
  • Watch event: 3
  • Push event: 5
  • Fork event: 1

Committers

Last synced: almost 2 years ago

All Time
  • Total Commits: 69
  • Total Committers: 1
  • Avg Commits per committer: 69.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 69
  • Committers: 1
  • Avg Commits per committer: 69.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Thieu Nguyen n****2@g****m 69

Issues and Pull Requests

Last synced: 5 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 46 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 8
  • Total maintainers: 1
pypi.org: metacluster

MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 46 Last month
Rankings
Dependent packages count: 10.1%
Dependent repos count: 21.5%
Average: 22.6%
Downloads: 23.9%
Stargazers count: 27.8%
Forks count: 29.8%
Maintainers (1)
Last synced: 5 months ago

Dependencies

requirements.txt pypi
  • flake8 >=4.0.1
  • kaleido >=0.2.1
  • mealpy >=2.5.3
  • numpy >=1.17.1
  • pandas >=1.3.5
  • permetrics >=1.3.3
  • plotly >=5.10.0
  • pytest ==7.1.2
  • pytest-cov ==4.0.0
  • scikit-learn >=1.0.2
  • scipy >=1.7.1
.github/workflows/publish-package.yaml actions
  • actions/cache v1 composite
  • actions/checkout v1 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v1 composite
  • actions/upload-artifact master composite
  • pypa/gh-action-pypi-publish master composite
docs/requirements.txt pypi
  • kaleido >=0.2.1
  • mealpy >=3.0.1
  • numpy >=1.17.1
  • pandas >=1.3.5
  • permetrics >=1.5.0
  • plotly >=5.10.0
  • readthedocs-sphinx-search ==0.1.1
  • scikit-learn >=1.0.2
  • scipy >=1.7.1
  • sphinx ==4.4.0
  • sphinx_rtd_theme ==1.0.0
setup.py pypi
  • numpy >=1.17.1
  • pandas >=1.3.5
  • plotly >=5.10.0