https://github.com/mews-labs/palma

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary

Keywords

automl automl-api automl-pipeline data-leakage drift-detection machine-learning machine-learning-algorithms python sklearn

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: mews-labs
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://eurobios-mews-labs.github.io/palma/
Size: 18.2 MB

Statistics

Stars: 2
Watchers: 1
Forks: 5
Open Issues: 10
Releases: 8

Topics

automl automl-api automl-pipeline data-leakage drift-detection machine-learning machine-learning-algorithms python sklearn

Created about 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

Project for Automated Learning MAchine

This library aims at providing tools for an automatic machine learning approach. As many tools already exist to establish one or the other component of an AutoML approach, the idea of this library is to provide a structure rather than to implement a complete service. In this library, a broad definition of AutoML is used : it covers the optimization of hyperparameters, the historization of models, the analysis of performances etc. In short, any element that can be replicated and that must, in most cases, be included in the analysis results of the models. Also, thanks to the use of components, this library is designed to be modular and allows the user to add his own analyses.
It therefore contains the following elements

A vanilla approach described below (in basic usage section) and in the notebooks classification and regression. In this approach, the users define a Project, which can then be passed to either a ModelSelector to find the best model for this project, or to a ModelEvaluation to study more in depth the behavior of a given model on this project.
A collection of components that can be added to enrich analysis.

Install it with powershell python -m pip install palma

Documentation

Access the full documentation here.

Basic usage

Start your project

To start using the library, use the project class

```python import pandas as pd from sklearn.datasets import makeclassification from sklearn.modelselection import ShuffleSplit from palma import Project

X, y = makeclassification(ninformative=2, n_features=100) X, y = pd.DataFrame(X), pd.Series(y).astype(bool)

project = Project(problem="classification", project_name="default")

project.start( X, y, splitter=ShuffleSplit(nsplits=10, randomstate=42), ) ```

The instantiation defines the type of problem and the start method will set what is needed to carry out ML project :

A testing strategy (argument splitter). That will define train and test instances. Note that we use cross validator from sklearn to do that. In the optimisation of hyper-parameters, a train test split will be operated, in this case, the first split will be used. This implies for instance that if you want 80/20 splitting method that shuffle the dataset, you should use

python splitter = model_selection.ShuffleSplit(n_splits=5, random_state=42)

Training data X and target y

This initialization is done in two steps to allow user to add optional Components to the project before its start.

Run hyper-optimisation

The hyper-optimisation process will look for the best model in pool of models that tend to perform well on various problem. For this specific task we make use of FLAML module. After hyper parametrisation, the metric to track can be computed

```python from palma import ModelSelector

ms = ModelSelector(engine="FlamlOptimizer", engineparameters=dict(timebudget=30)) ms.start(project) print(ms.bestmodel) ```

Tailoring and analysing your estimator

```python from palma import ModelEvaluation from sklearn.ensemble import RandomForestClassifier

Use your own

model = ModelEvaluation(estimator=RandomForestClassifier()) model.fit(project)

Get the optimized estimator

model = ModelEvaluation(estimator=ms.bestmodel) model.fit(project) ```

Contributing

You are very welcome to contribute to the project, by requesting features, pointing out new tools that can be added as component, by identifying issues and creating new features. Development guidelines will be detailed in near future.

Fork the repository
Clone your forked repository git clone https://github.com/$USER/palma.git
Test using pytest pip install pytest; pytest tests/
Submit you work with a pull request.

Authors

Eurobios Mews Labs

GitHub Events

Total

Last Year

Dependencies

.github/workflows/pytest.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite

pyproject.toml pypi

PyQt5 * develop
coverage ^6.4 develop
pre-commit ^2 develop
pylint * develop
pytest ^7.1.3 develop
pytest-cov ^3.0.0 develop
FLAML >1.0.12, <2
boto3 *
deepchecks ^0.8
explainerdashboard >=0.3
frozendict ^2.3.4
llvmlite ^0.39
matplotlib ^3.4
memory-profiler ^0.60.0
mlflow *
numpy ^1
pandas ^1
pandas-profiling ^3.2
plot-metric ^0
pyaml >12
python >=3.9,<3.11
scikit-learn ^1
seaborn ^0.12.0
shap *
tabulate ^0.8.10
xgboost >1 <2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science