vulpes

Vulpes: Test many classification, regression models and clustering algorithms to see which one is most suitable for your dataset

https://github.com/adrienc21/vulpes

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.6%) to scientific vocabulary

Keywords

automl data-analysis data-science machine-learning models package python scikit-learn statistics

Keywords from Contributors

score-based-generative-modeling

Last synced: 6 months ago · JSON representation ·

Repository

Vulpes: Test many classification, regression models and clustering algorithms to see which one is most suitable for your dataset

Basic Info

Host: GitHub
Owner: AdrienC21
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 2.88 MB

Statistics

Stars: 8
Watchers: 2
Forks: 0
Open Issues: 1
Releases: 0

Topics

automl data-analysis data-science machine-learning models package python scikit-learn statistics

Created over 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme Changelog License Citation

Vulpes

Vulpes: Test many classification, regression models and clustering algorithms to see which one is most suitable for your dataset.

Vulpes is a Python package that allows you to test many models, whether you want to do classification, regression or clustering in your projects. It calculates many metrics for each model to compare them. It is highly customizable and it contains many features to save time building robust ML models.

If you like this project, please leave a star on GitHub !

Alpha version.

Author & Maintainer: Adrien Carrel.

Installation

Using pip:

python pip install vulpes

Dependencies

vulpes requires:

Python (>= 3.7)
numpy (>= 1.22)
pandas (>= 1.3.5)
scikit-learn (>= 1.0.2)
tqdm (>= 4.64.0)
xgboost (>= 1.6.1)
lightgbm (>= 3.3.2)

Documentation

Link to the documentation: https://vulpes.readthedocs.io/en/latest/

Examples

General case, import one of the classes Classifiers, Regressions, Clustering from vulpes.automl, add some parameters to the object (optional), fit your dataset:

python from vulpes.automl import Classifiers classifiers = Classifiers() classifiers.fit(X, y)

More examples below and in notebooks in the folter examples.

Classification

Fit many classification algorithms on the iris dataset from scikit-learn:

```python import pandas as pd from sklearn.datasets import load_iris from vulpes.automl import Classifiers

dataset = loadiris() X = pd.DataFrame(dataset["data"], columns=dataset["featurenames"]) y = dataset["target"]

classifiers = Classifiers(preprocessing="default") dfmodels = classifiers.fit(X, y) dfmodels ```

Analysis of each model using different metrics and repeated cross-validation by K-fold:

| |------------------------- | LinearDiscriminantAnalysis | | QuadraticDiscriminantAnalysis | | LogisticRegressionCV | SVC | RandomForestClassifier | | GaussianNB | ExtraTreesClassifier | LogisticRegression | GradientBoostingClassifier | | XGBClassifier | BaggingClassifier | KNeighborsClassifier | AdaBoostClassifier | LGBMClassifier | LabelSpreading | HistGradientBoostingClassifier | | LabelPropagation | MLPClassifier | DecisionTreeClassifier | | LinearSVC | ExtraTreeClassifier | SGDClassifier | CalibratedClassifierCV | | Perceptron | NearestCentroid | RidgeClassifier | RidgeClassifierCV | BernoulliNB | DummyClassifier | Model | Balanced Accuracy | Accuracy | Precision | Recall | F1 Score | AUROC | AUPRC | Micro avg Precision | Running time | ------:|------------------:|---------:|----------:|---------:|---------:|---------:|---------:|--------------------:|-------------:| 0.977625 | 0.977333 | 0.978024 | 0.977625 | 0.976933 | 0.998161 | 0.996891 | 0.996940 | 4.372556 | 0.973219 | 0.973333 | 0.975460 | 0.973219 | 0.973162 | 0.999063 | 0.997595 | 0.997634 | 4.470590 | | 0.961609 | 0.961333 | 0.964101 | 0.961609 | 0.960668 | 0.997218 | 0.993264 | 0.993375 | 12.895212 | | 0.961287 | 0.960000 | 0.962045 | 0.961287 | 0.959960 | 0.996825 | 0.994421 | 0.994510 | 4.437862 | 0.957220 | 0.956000 | 0.959982 | 0.957220 | 0.955394 | 0.993473 | 0.990367 | 0.989958 | 10.645725 | | 0.957169 | 0.954667 | 0.956188 | 0.957169 | 0.954521 | 0.993825 | 0.990463 | 0.990619 | 4.345500 | | 0.956438 | 0.956000 | 0.958665 | 0.956438 | 0.955157 | 0.995156 | 0.991795 | 0.991704 | 10.440453 | | 0.956094 | 0.954667 | 0.957273 | 0.956094 | 0.954427 | 0.997726 | 0.994765 | 0.994848 | 5.691309 | 0.955871 | 0.953333 | 0.956984 | 0.955871 | 0.953364 | 0.983221 | 0.967145 | 0.971317 | 9.005045 | | 0.952846 | 0.950667 | 0.952745 | 0.952846 | 0.950324 | 0.985892 | 0.969083 | 0.972853 | 4.802282 | | 0.952712 | 0.950667 | 0.955214 | 0.952712 | 0.950581 | 0.985295 | 0.982312 | 0.971742 | 8.354026 | | 0.952699 | 0.950667 | 0.951586 | 0.952699 | 0.950683 | 0.990842 | 0.986716 | 0.980262 | 6.960091 | | 0.950432 | 0.946667 | 0.949250 | 0.950432 | 0.947114 | 0.988202 | 0.981889 | 0.977999 | 8.127254 | | 0.950009 | 0.948000 | 0.950426 | 0.950009 | 0.947522 | 0.991721 | 0.985483 | 0.985704 | 5.063474 | | 0.948757 | 0.945333 | 0.947960 | 0.948757 | 0.946091 | 0.988827 | 0.981177 | 0.981552 | 4.332253 | 0.948195 | 0.945333 | 0.949260 | 0.948195 | 0.945352 | 0.988212 | 0.976375 | 0.976866 | 7.706454 | | 0.946091 | 0.944000 | 0.946373 | 0.946091 | 0.944250 | 0.990341 | 0.984098 | 0.984373 | 4.406253 | | 0.944773 | 0.941333 | 0.945336 | 0.944773 | 0.942314 | 0.992075 | 0.985516 | 0.985762 | 7.662322 | 0.942681 | 0.941333 | 0.944493 | 0.942681 | 0.940183 | 0.957011 | 0.951111 | 0.908000 | 4.367503 | | 0.936713 | 0.936000 | 0.937548 | 0.936713 | 0.933929 | 0.989648 | 0.983251 | 0.983539 | 4.474272 | | 0.933964 | 0.932000 | 0.934967 | 0.933964 | 0.931137 | 0.950473 | 0.943333 | 0.893289 | 4.336813 | | 0.922581 | 0.918667 | 0.927593 | 0.922581 | 0.919651 | 0.981940 | 0.962839 | 0.963484 | 5.666082 | 0.894860 | 0.888000 | 0.896616 | 0.894860 | 0.887397 | 0.972231 | 0.957643 | 0.958332 | 5.699280 | | 0.873581 | 0.865333 | 0.887799 | 0.873581 | 0.864172 | 0.976069 | 0.945789 | 0.946695 | 4.482433 | | 0.854566 | 0.854667 | 0.854707 | 0.854566 | 0.849341 | 0.973214 | 0.963677 | 0.964257 | 5.783815 | | 0.843743 | 0.834667 | 0.848879 | 0.843743 | 0.831310 | 0.945148 | 0.920905 | 0.922219 | 4.415888 | | 0.841049 | 0.832000 | 0.846498 | 0.841049 | 0.828592 | 0.944421 | 0.919460 | 0.920816 | 4.484041 | | 0.757425 | 0.758667 | 0.771867 | 0.757425 | 0.728847 | 0.883542 | 0.839397 | 0.823834 | 4.479535 | | 0.333333 | 0.249333 | 0.083111 | 0.333333 | 0.132452 | 0.500000 | 0.379100 | 0.299444 | 4.396426 | | | | | | | | | | |

Here, the "default" preprocessing pipeline has been used. It consists of SimpleImputer (median strategy) with a StandardScaler for the features and a OneHotEncoder for the categorical features.

Regressions

Fit many regression algorithms:

```python from sklearn.datasets import make_regression from vulpes.automl import Regressions

X, y = makeregression( nsamples=100, nfeatures=4, randomstate=42, noise=4.0, bias=100.0)

regressions = Regressions() dfmodels = regressions.fit(X, y) dfmodels ```

Clustering

Fit many clustering algorithms on the iris dataset from scikit-learn:

```python import pandas as pd from sklearn.datasets import load_iris from vulpes.automl import Clustering

dataset = loadiris() X = pd.DataFrame(dataset["data"], columns=dataset["featurenames"])

clustering = Clustering() dfmodels = clustering.fit(X) dfmodels ```

Fit a "best model"

We can automatically build a VotingClassifier or a VotingRegressor using the buildbestmodels method once the models are fitted.

python df_best = classifiers.build_best_models(X, y, nb_models=3) df_best

| Model | Balanced Accuracy | Accuracy | Precision | Recall | F1 Score | Running time | |----------------:|------------------:|---------:|----------:|--------:|---------:|-------------:| | Voting (3-best) | 0.97508 | 0.974667 | 0.976034 | 0.97508 | 0.974447 | 11.82946 |

Check missing data

python import pandas as pd import numpy as np df = pd.DataFrame([["a", "x"], [np.nan, "y"], ["a", np.nan], ["b", np.nan]], dtype="category", columns=["feature1", "feature2"]) classifiers.missing_data(df)

| Total Missing | Percentage (%) | Accuracy | |--------------:|---------------:|---------:| | feature2 | 2 | 50.0 | | feature1 | 1 | 25.0 |

Testing

If you want to submit a pull request or if you want to test in local the package, you can run some tests with the library pytest by running the following command:

python pytest vulpes/tests/

Why Vulpes?

Vulpes stands for: Vector (Un)supervised Learning Program Estimation System.

Nah, I'm kidding, I just love foxes, they are cute! The most common and widespread species of fox is the red fox (Vulpes vulpes).

alt text

Acknowledgment

Shankar Rao Pandala (and some contributors). Their package (Lazy Predict) has been an inspiration.

License

MIT

Owner

Name: Adrien Carrel
Login: AdrienC21
Kind: user
Location: London

Website: https://adriencarrel.com/
Twitter: adriencarrel_
Repositories: 3
Profile: https://github.com/AdrienC21

Quantitative Researcher MSc Imperial College London (Advanced Computing) MEng CentraleSupélec (Applied Mathematics, Diplôme d'ingénieur)

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Carrel"
  given-names: "Adrien"
  orcid: "https://orcid.org/0000-0002-0051-2247"
title: "Vulpes: Test many classification, regression models and clustering algorithms to see which one is most suitable for your dataset."
version: 1.0.0
date-released: 2022-07-01
url: "https://github.com/AdrienC21/vulpes"

GitHub Events

Total

Last Year

Committers

Last synced: over 1 year ago

All Time

Total Commits: 25
Total Committers: 2
Avg Commits per committer: 12.5
Development Distribution Score (DDS): 0.2

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Adrien Carrel	a**l@h**r	20
Adrien Carrel	2****1	5

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

thsgr (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 9 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

pypi.org: vulpes

Test many classification, regression models and clustering algorithms to see which one is most suitable for your dataset.

Homepage: https://vulpes.readthedocs.io/en/latest/
Documentation: https://vulpes.readthedocs.io/en/latest/
License: MIT
Latest release: 0.2.0
published over 3 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 9 Last month

Rankings

Dependent packages count: 6.6%

Stargazers count: 18.6%

Average: 24.4%

Forks count: 30.5%

Dependent repos count: 30.6%

Downloads: 35.7%

Maintainers (1)

adrien.carrel

Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi

m2r2 *

requirements.txt pypi

lightgbm >=3.3.2
numpy >=1.21.3
pandas >=1.3.5
scikit-learn >=1.0.2
tqdm >=4.64.0
xgboost >=1.6.1

setup.py pypi

vulpes

Science Score: 44.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Vulpes

Installation

Dependencies

Documentation

Examples

Classification

Regressions

Clustering

Fit a "best model"

Check missing data

Testing

Why Vulpes?

Acknowledgment

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: vulpes

Rankings

Maintainers (1)

Dependencies