metacluster
MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.2%) to scientific vocabulary
Keywords
Repository
MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems
Basic Info
- Host: GitHub
- Owner: thieu1995
- License: gpl-3.0
- Language: Python
- Default Branch: master
- Homepage: https://metacluster.readthedocs.org
- Size: 3.15 MB
Statistics
- Stars: 14
- Watchers: 1
- Forks: 4
- Open Issues: 0
- Releases: 7
Topics
Metadata Files
README.md
MetaCluster is the largest open-source nature-inspired optimization (Metaheuristic Algorithms) library for clustering problem in Python
- Free software: GNU General Public License (GPL) V3 license
- Provided 3 classes:
MetaCluster,MhaKCentersClustering, andMhaKMeansTuner - Total nature-inspired metaheuristic optimizers (Metaheuristic Algorithms): > 200 optimizers
- Total objective functions (as fitness): > 40 objectives
- Total supported datasets: 48 datasets from Scikit learn, UCI, ELKI, KEEL...
- Total performance metrics: > 40 metrics
- Total different way of detecting the K value: >= 10 methods
- Documentation: https://metacluster.readthedocs.io/en/latest/
- Python versions: >= 3.7.x
- Dependencies: numpy, scipy, scikit-learn, pandas, mealpy, permetrics, plotly, kaleido
Citation Request
Please include these citations if you plan to use this library:
```code @article{VanThieu2023, author = {Van Thieu, Nguyen and Oliva, Diego and Pérez-Cisneros, Marco}, title = {MetaCluster: An open-source Python library for metaheuristic-based clustering problems}, journal = {SoftwareX}, year = {2023}, pages = {101597}, volume = {24}, DOI = {10.1016/j.softx.2023.101597}, }
@article{van2023mealpy, title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python}, author={Van Thieu, Nguyen and Mirjalili, Seyedali}, journal={Journal of Systems Architecture}, year={2023}, publisher={Elsevier}, doi={10.1016/j.sysarc.2023.102871} } ```
Installation
- Install the current PyPI release:
bash $ pip install metacluster
After installation, check the version: ```bash $ python
import metacluster metacluster.version ```
Examples
We implement a dedicated Github repository for examples at MetaCluster_examples
Let's go through some basic examples from here:
1. First, load dataset. You can use the available datasets from MetaCluster:
```python
Load available dataset from MetaCluster
from metacluster import get_dataset
Try unknown data
get_dataset("unknown")
Enter: 1 -> This wil list all of avaialble dataset
data = get_dataset("Arrhythmia") ```
- Or you can load your own dataset
```python import pandas as pd from metacluster import Data
load X and y
NOTE MetaCluster accepts numpy arrays only, hence use the .values attribute
dataset = pd.readcsv('examples/dataset.csv', indexcol=0).values X, y = dataset[:, 0:-1], dataset[:, -1] data = Data(X, y, name="my-dataset") ```
2. Next, scale your features
You should confirm that your dataset is scaled and normalized
```python
MinMaxScaler
data.X, scaler = data.scale(data.X, method="MinMaxScaler", feature_range=(0, 1))
StandardScaler
data.X, scaler = data.scale(data.X, method="StandardScaler")
MaxAbsScaler
data.X, scaler = data.scale(data.X, method="MaxAbsScaler")
RobustScaler
data.X, scaler = data.scale(data.X, method="RobustScaler")
Normalizer
data.X, scaler = data.scale(data.X, method="Normalizer", norm="l2") # "l1" or "l2" or "max" ```
3. Next, select Metaheuristic Algorithm, Its parameters, list of objectives, and list of performance metrics
python
list_optimizer = ["BaseFBIO", "OriginalGWO", "OriginalSMA"]
list_paras = [
{"name": "FBIO", "epoch": 10, "pop_size": 30},
{"name": "GWO", "epoch": 10, "pop_size": 30},
{"name": "SMA", "epoch": 10, "pop_size": 30}
]
list_obj = ["SI", "RSI"]
list_metric = ["BHI", "DBI", "DI", "CHI", "SSEI", "NMIS", "HS", "CS", "VMS", "HGS"]
You can check all supported metaheuristic algorithms from: https://github.com/thieu1995/mealpy. All supported clustering objectives and metrics from: https://github.com/thieu1995/permetrics.
If you don't want to read the documents, you can print out all supported information by:
```python from metacluster import MetaCluster
Get all supported methods and print them out
MetaCluster.get_support(name="all") ```
4. Next, create an instance of MetaCluster class and run it.
```python model = MetaCluster(listoptimizer=listoptimizer, listparas=listparas, listobj=listobj, n_trials=3, seed=10)
model.execute(data=data, clusterfinder="elbow", listmetric=listmetric, savepath="history", verbose=False)
model.saveboxplots() model.saveconvergences() ```
As you can see, you can define different datasets and using the same model to run it. Remember to set the name to your dataset, because the folder that hold your results is the name of your dataset. More examples can be found here
Support
Official links (questions, problems)
- Official source code repo: https://github.com/thieu1995/metacluster
- Official document: https://metacluster.readthedocs.io/
- Download releases: https://pypi.org/project/metacluster/
- Issue tracker: https://github.com/thieu1995/metacluster/issues
- Notable changes log: https://github.com/thieu1995/metacluster/blob/master/ChangeLog.md
Official chat group: https://t.me/+fRVCJGuGJg1mNDg1
This project also related to our another projects which are optimization and machine learning. Check it here:
- https://github.com/thieu1995/metaheuristics
- https://github.com/thieu1995/mealpy
- https://github.com/thieu1995/mafese
- https://github.com/thieu1995/pfevaluator
- https://github.com/thieu1995/opfunu
- https://github.com/thieu1995/enoppy
- https://github.com/thieu1995/permetrics
- https://github.com/thieu1995/IntelELM
- https://github.com/thieu1995/MetaPerceptron
- https://github.com/thieu1995/GrafoRVFL
- https://github.com/aiir-team
Supported links
code
1. https://jtemporal.com/kmeans-and-elbow-method/
2. https://medium.com/@masarudheena/4-best-ways-to-find-optimal-number-of-clusters-for-clustering-with-python-code-706199fa957c
3. https://github.com/minddrummer/gap/blob/master/gap/gap.py
4. https://www.tandfonline.com/doi/pdf/10.1080/03610927408827101
5. https://doi.org/10.1016/j.engappai.2018.03.013
6. https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Clustering-Dimensionality-Reduction/Clustering_metrics.ipynb
7. https://elki-project.github.io/
8. https://sci2s.ugr.es/keel/index.php
9. https://archive.ics.uci.edu/datasets
10. https://python-charts.com/distribution/box-plot-plotly/
11. https://plotly.com/python/box-plots/?_ga=2.50659434.2126348639.1688086416-114197406.1688086416#box-plot-styling-mean--standard-deviation
Owner
- Name: Nguyen Van Thieu
- Login: thieu1995
- Kind: user
- Location: Earth
- Company: AIIR Group
- Website: https://thieu1995.github.io/
- Repositories: 13
- Profile: https://github.com/thieu1995
Knowledge is power, sharing it is the premise of progress in life. It seems like a burden to someone, but it is the only way to achieve immortality.
Citation (CITATION.cff)
cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Van Thieu"
given-names: "Nguyen"
orcid: "https://orcid.org/0000-0001-9994-8747"
title: "MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems"
version: 1.3.0
doi: 10.5281/zenodo.8214539
date-released: 2023-08-26
url: "https://github.com/thieu1995/metacluster"
GitHub Events
Total
- Watch event: 3
- Push event: 5
- Fork event: 1
Last Year
- Watch event: 3
- Push event: 5
- Fork event: 1
Committers
Last synced: almost 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Thieu Nguyen | n****2@g****m | 69 |
Issues and Pull Requests
Last synced: 5 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 46 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 8
- Total maintainers: 1
pypi.org: metacluster
MetaCluster: An Open-Source Python Library for Metaheuristic-based Clustering Problems
- Homepage: https://github.com/thieu1995/metacluster
- Documentation: https://metacluster.readthedocs.io/
- License: GPLv3
-
Latest release: 1.3.0
published 5 months ago
Rankings
Maintainers (1)
Dependencies
- flake8 >=4.0.1
- kaleido >=0.2.1
- mealpy >=2.5.3
- numpy >=1.17.1
- pandas >=1.3.5
- permetrics >=1.3.3
- plotly >=5.10.0
- pytest ==7.1.2
- pytest-cov ==4.0.0
- scikit-learn >=1.0.2
- scipy >=1.7.1
- actions/cache v1 composite
- actions/checkout v1 composite
- actions/download-artifact v2 composite
- actions/setup-python v1 composite
- actions/upload-artifact master composite
- pypa/gh-action-pypi-publish master composite
- kaleido >=0.2.1
- mealpy >=3.0.1
- numpy >=1.17.1
- pandas >=1.3.5
- permetrics >=1.5.0
- plotly >=5.10.0
- readthedocs-sphinx-search ==0.1.1
- scikit-learn >=1.0.2
- scipy >=1.7.1
- sphinx ==4.4.0
- sphinx_rtd_theme ==1.0.0
- numpy >=1.17.1
- pandas >=1.3.5
- plotly >=5.10.0