tinyautoml

TinyAutoML is a comprehensive Pipeline Classifier Project thought as a Scikit-learn plugin

https://github.com/g0bel1n/tinyautoml

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary

Keywords

automl-pipeline machine-learning scikit-learn

Last synced: 6 months ago · JSON representation

Repository

TinyAutoML is a comprehensive Pipeline Classifier Project thought as a Scikit-learn plugin

Basic Info

Host: GitHub
Owner: g0bel1n
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 2.73 MB

Statistics

Stars: 4
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 3

Topics

automl-pipeline machine-learning scikit-learn

Created about 4 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License Citation

README.md

TinyAutoML is a Machine Learning Python3.9 library thought as an extension of Scikit-Learn.
It builds an adaptable and auto-tuned pipeline to handle binary classification tasks.

Licence MIT Pypi Size Commits

In a few words, your data goes through 2 main preprocessing steps.
The first one is scaling and NonStationnarity correction, which is followed by Lasso Feature selection.
Finally, one of the three MetaModels is fitted on the transformed data.

Latest News ! :

Logging format changed from default to [TinyAutoML]
Added Github Actions Workflow for CI, for updating the README.md !
Added parallel computation of LassoFeatureSelector -> LassoFeatureSelectionParallel
New example notebook based on VIX index directionnal forecasting

⚡️ Quick start

First, let's install and import the library !

Install the last release using pip

python %pip install TinyAutoML`

python import os os.chdir('..') #For Github CI, you don't have to run that

python from TinyAutoML.Models import * from TinyAutoML import MetaPipeline

`MetaModels`

MetaModels inherit from the MetaModel Abstract Class. They all implement ensemble methods and therefore are based on EstimatorPools.

When training EstimatorPools, you are faced with a choice : doing parameterTuning on entire pipelines with the estimators on the top or training the estimators using the same pipeline and only training the top. The first case refers to what we will be calling comprehensiveSearch.

Moreover, as we will see in details later, those EstimatorPools can be shared across MetaModels.

They are all initialised with those minimum arguments :

python MetaModel(comprehensiveSearch: bool = True, parameterTuning: bool = True, metrics: str = 'accuracy', nSplits: int=10) - nSplits corresponds to the number of split of the cross validation - The other parameters are equivoque

They need to be put in the MetaPipeline wrapper to work

There are 3 MetaModels

1- BestModel : selects the best performing model of the pool

python best_model = MetaPipeline(BestModel(comprehensiveSearch = False, parameterTuning = False))

2- OneRulerForAll : implements Stacking using a RandomForestClassifier by default. The user is free to use another classifier using the ruler arguments

python orfa_model = MetaPipeline(OneRulerForAll(comprehensiveSearch=False, parameterTuning=False))

3- DemocraticModel : implements Soft and Hard voting models through the voting argument

python democratic_model = MetaPipeline(DemocraticModel(comprehensiveSearch=False, parameterTuning=False, voting='soft'))

As of release v0.2.3.2 (13/04/2022) there are 5 models on which these MetaModels rely in the EstimatorPool: - Random Forest Classifier - Logistic Regression - Gaussian Naive Bayes - Linear Discriminant Analysis - XGBoost

We'll use the breast_cancer dataset from sklearn as an example:

```python import pandas as pd from sklearn.datasets import loadbreastcancer

cancer = loadbreastcancer()

X = pd.DataFrame(data=cancer.data, columns=cancer.feature_names) y = cancer.target

cut = int(len(y) * 0.8)

Xtrain, Xtest = X[:cut], X[cut:] ytrain, ytest = y[:cut], y[cut:] ```

Let's train a BestModel first and reuse its Pool for the other MetaModels

python best_model.fit(X_train,y_train)

[TinyAutoML] Training models...
[TinyAutoML] The best estimator is random forest classifier with a cross-validation accuracy (in Sample) of 1.0





MetaPipeline(model=BestModel(comprehensiveSearch=False, parameterTuning=False))

We can now extract the pool

python pool = best_model.get_pool()

And use it when fitting the other MetaModels to skip the fitting of the underlying models:

python orfa_model.fit(X_train,y_train,pool=pool) democratic_model.fit(X_train,y_train,pool=pool)

[TinyAutoML] Training models...
[TinyAutoML] Training models...





MetaPipeline(('model', Democratic Model))

Great ! Let's look at the results with the sklearn `classificationreport` :

python orfa_model.classification_report(X_test,y_test)

              precision    recall  f1-score   support

           0       0.89      0.92      0.91        26
           1       0.98      0.97      0.97        88

    accuracy                           0.96       114
   macro avg       0.93      0.94      0.94       114
weighted avg       0.96      0.96      0.96       114

Looking good! What about the roc_curve ?

python democratic_model.roc_curve(X_test,y_test)

png

Let's see how the estimators of the pool are doing individually:

python best_model.get_scores(X_test,y_test)

[('random forest classifier', 1.0),
 ('Logistic Regression', 0.9473684210526315),
 ('Gaussian Naive Bayes', 0.956140350877193),
 ('LDA', 0.9473684210526315),
 ('xgb', 0.956140350877193)]

What's next ?

You can do the same steps with comprehensiveSearch set to True if you have the time and if you want to improve your results. You can also try new rulers and so on.

Owner

Name: Lucas Saban
Login: g0bel1n
Kind: user
Location: Paris
Company: Ensae Paris | MVA

Twitter: g0bel1n
Repositories: 9
Profile: https://github.com/g0bel1n

ML, Deep Learning and Optimization. Student.

GitHub Events

Total

Last Year

Committers

Last synced: 7 months ago

All Time

Total Commits: 178
Total Committers: 3
Avg Commits per committer: 59.333
Development Distribution Score (DDS): 0.107

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
g0bel1n	l**n@i**m	159
Thomas Kientz	t**s@k**t	15
readme update bot	l****n	4

Committer Domains (Top 20 + Academic)

kientz.net: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 2
Total pull requests: 8
Average time to close issues: N/A
Average time to close pull requests: 3 days
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.5
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

g0bel1n (1)

Pull Request Authors

thomktz (8)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 6 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 30
Total maintainers: 1

pypi.org: tinyautoml

Combinaison of ML models for binary classification. Academic Project.

Homepage: https://github.com/g0bel1n/TinyAutoML
Documentation: https://tinyautoml.readthedocs.io/
License: MIT
Latest release: 0.2.4
published almost 4 years ago

Versions: 30
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 6 Last month

Rankings

Dependent packages count: 10.1%

Dependent repos count: 21.6%

Stargazers count: 25.0%

Average: 27.5%

Forks count: 29.8%

Downloads: 50.8%

Maintainers (1)

isab01

Last synced: 6 months ago

Dependencies

requirements.txt pypi

matplotlib *
numpy *
pandas ==1.3.4
pytest *
scikit-learn ==1.0.2
statsmodels *
tqdm *
xgboost *

setup.py pypi

matplotlib *
numpy *
pandas *
scikit-learn *
statsmodels *
tqdm *
xgboost *

.github/workflows/python-app.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/udpate-readme.yml actions

EndBug/add-and-commit v7 composite
actions/checkout v2 composite
actions/setup-python v2 composite