mafese

Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python

https://github.com/thieu1995/mafese

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.2%) to scientific vocabulary

Keywords

decision-tree-classifier dimensionality-reduction feature-extraction feature-selection genetic-algorithm harris-hawks-optimization knn-classifier machine-learning mutual-information optimization pearson-correlation-coefficient relief-f subset-selection svm-classifier wrapper-methods
Last synced: 6 months ago · JSON representation ·

Repository

Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python

Basic Info
Statistics
  • Stars: 84
  • Watchers: 1
  • Forks: 25
  • Open Issues: 0
  • Releases: 12
Topics
decision-tree-classifier dimensionality-reduction feature-extraction feature-selection genetic-algorithm harris-hawks-optimization knn-classifier machine-learning mutual-information optimization pearson-correlation-coefficient relief-f subset-selection svm-classifier wrapper-methods
Created over 3 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

MAFESE


GitHub release Wheel PyPI version PyPI - Python Version PyPI - Downloads Downloads Run Tests Documentation Status Chat DOI License: GPL v3


MAFESE (Metaheuristic Algorithms for FEature SElection) is the largest open-source Python library dedicated to the feature selection (FS) problem using metaheuristic algorithms. It contains filter, wrapper, embedded, and unsupervised-based methods with modern optimization techniques. Whether you're tackling classification or regression tasks, MAFESE helps automate and enhance feature selection to improve model performance.


🔥 Key Features

  • 🆓 Free software: GNU General Public License (GPL) V3 license
  • 🔄 Total Wrapper-based (Metaheuristic Algorithms): > 200 methods
  • 📊 Total Filter-based (Statistical-based): > 15 methods
  • 🌳 Total Embedded-based (Tree and Lasso): > 10 methods
  • 🔍 Total Unsupervised-based: ≥ 4 methods
  • 📂 Built-in Datasets: ≥ 30 datasets (47 classifications, 7 regressions)
  • 📈 Total performance metrics: ≥ 61 (45 regressions and 16 classifications)
  • ⚙️ Total objective functions (as fitness functions): ≥ 61 (45 regressions and 16 classifications)
  • 📖 Documentation: https://mafese.readthedocs.io/en/latest/
  • 🐍 Python versions: ≥ 3.8.x
  • 📦 Dependencies: numpy, scipy, scikit-learn, pandas, mealpy, permetrics, plotly, kaleido

🎯 Goals

MAFESE provides all state-of-the-art feature selection (FS) methods:

  • 🧠 Unsupervised-based FS

  • 🔎 Filter-based FS

  • 🌲 Embedded-based FS

    • Regularization (Lasso-based)
    • Tree-based methods
  • ⚙️ Wrapper-based FS

    • Sequential-based: forward and backward
    • Recursive-based
    • MHA-based: Metaheuristic Algorithms

📝 Citation

Please include these citations if you plan to use this incredible library:

```bibtex @article{van2024feature, title={Feature selection using metaheuristics made easy: Open source MAFESE library in Python}, author={Van Thieu, Nguyen and Nguyen, Ngoc Hung and Heidari, Ali Asghar}, journal={Future Generation Computer Systems}, year={2024}, publisher={Elsevier}, doi={10.1016/j.future.2024.06.006}, url={https://doi.org/10.1016/j.future.2024.06.006}, }

@article{van2023mealpy, title={MEALPY: An open-source library for latest meta-heuristic algorithms in Python}, author={Van Thieu, Nguyen and Mirjalili, Seyedali}, journal={Journal of Systems Architecture}, year={2023}, publisher={Elsevier}, doi={10.1016/j.sysarc.2023.102871} } ```

Installation

Install the latest release from PyPI:

bash $ pip install mafese

After installation, check the version:

```bash $ python

import mafese mafese.version ```

🚀 Quick Start

1. Load Dataset

Use a built-in dataset:

python from mafese import get_dataset data = get_dataset("Arrhythmia")

Or load your own:

```python import pandas as pd from mafese import Data

df = pd.readcsv('examples/dataset.csv', indexcol=0).values X, y = df[:, :-1], df[:, -1] data = Data(X, y) ```

2. Next, prepare your dataset

Split Train/Test

python data.split_train_test(test_size=0.2) print(data.X_train[:2].shape) print(data.y_train[:2].shape)

Scale Features and Labels

```python data.Xtrain, scalerX = data.scale(data.Xtrain, scalingmethods=("standard", "minmax")) data.Xtest = scalerX.transform(data.X_test)

data.ytrain, scalery = data.encodelabel(data.ytrain) # Classification only data.ytest = scalery.transform(data.y_test) ```

3. Select Feature Selection Method

```python

First way, we recommended

from mafese import UnsupervisedSelector, FilterSelector, LassoSelector, TreeSelector from mafese import SequentialSelector, RecursiveSelector, MhaSelector, MultiMhaSelector

Second way

from mafese.unsupervised import UnsupervisedSelector from mafese.filter import FilterSelector from mafese.embedded.lasso import LassoSelector from mafese.embedded.tree import TreeSelector from mafese.wrapper.sequential import SequentialSelector from mafese.wrapper.recursive import RecursiveSelector from mafese.wrapper.mha import MhaSelector, MultiMhaSelector ```

4. Next, create an instance of Selector class you want to use:

```python featselector = UnsupervisedSelector(problem='classification', method='DR', nfeatures=5)

featselector = FilterSelector(problem='classification', method='SPEARMAN', nfeatures=5)

featselector = LassoSelector(problem="classification", estimator="lasso", estimatorparas={"alpha": 0.1})

feat_selector = TreeSelector(problem="classification", estimator="tree")

featselector = SequentialSelector(problem="classification", estimator="knn", nfeatures=3, direction="forward")

featselector = RecursiveSelector(problem="classification", estimator="rf", nfeatures=5)

featselector = MhaSelector(problem="classification",objname="AS", estimator="knn", estimatorparas=None, optimizer="BaseGA", optimizerparas=None, mode='single', n_workers=None, termination=None, seed=None, verbose=True)

featselector = MultiMhaSelector(problem="classification", objname="AS", estimator="knn", estimatorparas=None, listoptimizers=("OriginalWOA", "OriginalGWO", "OriginalTLO", "OriginalGSKA"), listoptimizerparas=[{"epoch": 10, "popsize": 30}, ]*4, mode='single', nworkers=None, termination=None, seed=None, verbose=True) ```

5. Fit the model to Xtrain and ytrain

python feat_selector.fit(data.X_train, data.y_train)

6. Get the information

```python

check selected features - True (or 1) is selected, False (or 0) is not selected

print(featselector.selectedfeaturemasks) print(featselector.selectedfeaturesolution)

check the index of selected features

print(featselector.selectedfeature_indexes) ```

7. Call transform() on the X that you want to filter it down to selected features

python X_train_selected = feat_selector.transform(data.X_train) X_test_selected = feat_selector.transform(data.X_test)

8.You can build your own evaluating method or use our method.

If you use our method, don't transform the data.

8.1 You can use difference estimator than the one used in feature selection process

```python feat_selector.evaluate(estimator="svm", data=data, metrics=["AS", "PS", "RS"])

Here, we pass the data that was loaded above. So it contains both train and test set. So, the results will look

like this: {'AStrain': 0.77176, 'PStrain': 0.54177, 'RStrain': 0.6205, 'AStest': 0.72636, 'PStest': 0.34628, 'RStest': 0.52747} ```

8.2 You can use the same estimator in feature selection process

python X_test, y_test = data.X_test, data.y_test feat_selector.evaluate(estimator=None, data=data, metrics=["AS", "PS", "RS"])

For more usage examples please look at examples folder.

❓ Troubleshooting

  1. Where do I find the supported metrics like above ["AS", "PS", "RS"]. What is that?

You can find it here: https://github.com/thieu1995/permetrics or use this

```python from mafese import MhaSelector

print(MhaSelector.SUPPORTEDREGRESSIONMETRICS) print(MhaSelector.SUPPORTEDCLASSIFICATIONMETRICS) ```

  1. How do I know my Selector support which estimator? which methods?

python print(feat_selector.SUPPORT) Or you better read the document from: https://mafese.readthedocs.io/en/latest/

  1. I got this type of error. How to solve it?

python raise ValueError("Existed at least one new label in y_pred.") ValueError: Existed at least one new label in y_pred.

This occurs only when you are working on a classification problem with a small dataset that has many classes. For instance, the "Zoo" dataset contains only 101 samples, but it has 7 classes. If you split the dataset into a training and testing set with a ratio of around 80% - 20%, there is a chance that one or more classes may appear in the testing set but not in the training set. As a result, when you calculate the performance metrics, you may encounter this error. You cannot predict or assign new data to a new label because you have no knowledge about the new label. There are several solutions to this problem.

  • 1st: Use the SMOTE method to address imbalanced data and ensure that all classes have the same number of samples.

```python from imblearn.over_sampling import SMOTE import pandas as pd from mafese import Data

dataset = pd.readcsv('examples/dataset.csv', indexcol=0).values X, y = dataset[:, 0:-1], dataset[:, -1]

Xnew, ynew = SMOTE().fitresample(X, y) data = Data(Xnew, y_new) ```

  • 2nd: Use different randomstate numbers in splittrain_test() function. ```python import pandas as pd from mafese import Data

dataset = pd.readcsv('examples/dataset.csv', indexcol=0).values X, y = dataset[:, 0:-1], dataset[:, -1] data = Data(X, y) data.splittraintest(testsize=0.2, randomstate=10) # Try different random_state value ```

📞 Community & Support


Developed by: Thieu @ 2023

Owner

  • Name: Nguyen Van Thieu
  • Login: thieu1995
  • Kind: user
  • Location: Earth
  • Company: AIIR Group

Knowledge is power, sharing it is the premise of progress in life. It seems like a burden to someone, but it is the only way to achieve immortality.

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Van Thieu"
    given-names: "Nguyen"
    orcid: "https://orcid.org/0000-0001-9994-8747"
  - family-names: "Nguyen"
    given-names: "Ngoc Hung"
    orcid: "https://orcid.org/0009-0007-7363-5014"
  - family-names: "Heidari"
    given-names: "Ali Asghar"
    orcid: "https://orcid.org/0000-0001-6938-9948"
title: "Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python"
version: 1.0.0
doi: 10.5281/zenodo.7969042
date-released: 2025-05-30
url: "https://github.com/thieu1995/mafese"

GitHub Events

Total
  • Release event: 2
  • Watch event: 19
  • Issue comment event: 1
  • Member event: 1
  • Push event: 9
  • Fork event: 4
  • Create event: 2
Last Year
  • Release event: 2
  • Watch event: 19
  • Issue comment event: 1
  • Member event: 1
  • Push event: 9
  • Fork event: 4
  • Create event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 7
  • Total pull requests: 0
  • Average time to close issues: about 2 months
  • Average time to close pull requests: N/A
  • Total issue authors: 7
  • Total pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: 8 months
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • thieu1995 (1)
  • renukasaravanan (1)
  • jishaaugustine (1)
  • jabeshnehemiah (1)
  • MH-Abid (1)
  • Target2target (1)
  • SafwanAlselwi (1)
Pull Request Authors
Top Labels
Issue Labels
enhancement (2) bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 268 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 12
  • Total maintainers: 1
pypi.org: mafese

Feature Selection using Metaheuristics Made Easy: Open Source MAFESE Library in Python

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 268 Last month
Rankings
Dependent packages count: 7.3%
Forks count: 17.2%
Average: 22.3%
Stargazers count: 23.4%
Dependent repos count: 41.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • joblib ==1.1.0
  • mealpy >=2.5.0
  • numpy >=1.15.1
  • opfunu >=1.0.0
  • permetrics >=1.3.0
  • scikit-learn ==1.0.1
.github/workflows/publish-package.yaml actions
  • actions/cache v1 composite
  • actions/checkout v1 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v1 composite
  • actions/upload-artifact master composite
  • pypa/gh-action-pypi-publish master composite
docs/requirements.txt pypi
  • kaleido >=0.2.1
  • mealpy >=2.5.4
  • numpy >=1.17.1
  • pandas >=1.3.5
  • permetrics >=1.4.2
  • plotly >=5.10.0
  • readthedocs-sphinx-search ==0.1.1
  • scikit-learn >=1.0.2
  • scipy >=1.7.1
  • sphinx ==4.4.0
  • sphinx_rtd_theme ==1.0.0
setup.py pypi
  • numpy >=1.17.1
  • pandas >=1.3.5
  • plotly >=5.10.0